{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Python good practices" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Environment setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [ "hide-output" ] }, "outputs": [], "source": [ "!pip install papermill" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python version: 3.7.5\n" ] } ], "source": [ "import platform\n", "\n", "print(f\"Python version: {platform.python_version()}\")\n", "assert platform.python_version_tuple() >= (\"3\", \"6\")\n", "\n", "import os\n", "import papermill as pm\n", "\n", "from IPython.display import YouTubeVideo" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Writing pythonic code" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Zen of Python, by Tim Peters\n", "\n", "Beautiful is better than ugly.\n", "Explicit is better than implicit.\n", "Simple is better than complex.\n", "Complex is better than complicated.\n", "Flat is better than nested.\n", "Sparse is better than dense.\n", "Readability counts.\n", "Special cases aren't special enough to break the rules.\n", "Although practicality beats purity.\n", "Errors should never pass silently.\n", "Unless explicitly silenced.\n", "In the face of ambiguity, refuse the temptation to guess.\n", "There should be one-- and preferably only one --obvious way to do it.\n", "Although that way may not be obvious at first unless you're Dutch.\n", "Now is better than never.\n", "Although never is often better than *right* now.\n", "If the implementation is hard to explain, it's a bad idea.\n", "If the implementation is easy to explain, it may be a good idea.\n", "Namespaces are one honking great idea -- let's do more of those!\n" ] } ], "source": [ "import this" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### What does \"Pythonic\" mean?\n", "\n", "- Python code is considered _pythonic_ if it:\n", " - conforms to the Python philosophy;\n", " - takes advantage of the language's specific features.\n", "- Pythonic code is nothing more than **idiomatic Python code** that strives to be clean, concise and readable." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example: swapping two variables" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "a = 3\n", "b = 2\n", "\n", "# Non-pythonic\n", "tmp = a\n", "a = b\n", "b = tmp\n", "\n", "# Pythonic\n", "a, b = b, a" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example: iterating on a list" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "my_list = [\"a\", \"b\", \"c\"]\n", "\n", "\n", "def do_something(item):\n", " # print(item)\n", " pass\n", "\n", "\n", "# Non-pythonic\n", "i = 0\n", "while i < len(my_list):\n", " do_something(my_list[i])\n", " i += 1\n", "\n", "# Still non-pythonic\n", "for i in range(len(my_list)):\n", " do_something(my_list[i])\n", "\n", "# Pythonic\n", "for item in my_list:\n", " do_something(item)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example: indexed traversal" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 -> a\n", "1 -> b\n", "2 -> c\n", "0 -> a\n", "1 -> b\n", "2 -> c\n" ] } ], "source": [ "my_list = [\"a\", \"b\", \"c\"]\n", "\n", "# Non-pythonic\n", "for i in range(len(my_list)):\n", " print(i, \"->\", my_list[i])\n", "\n", "# Pythonic\n", "for i, item in enumerate(my_list):\n", " print(i, \"->\", item)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example: searching in a list" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "fruits = [\"apples\", \"oranges\", \"bananas\", \"grapes\"]\n", "fruit = \"cherries\"\n", "\n", "# Non-pythonic\n", "found = False\n", "size = len(fruits)\n", "for i in range(0, size):\n", " if fruits[i] == fruit:\n", " found = True\n", "\n", "# Pythonic\n", "found = fruit in fruits" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example: generating a list\n", "\n", "This feature is called [list comprehension](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "numbers = [1, 2, 3, 4, 5, 6]\n", "\n", "# Non-pythonic\n", "doubles = []\n", "for i in range(len(numbers)):\n", " if numbers[i] % 2 == 0:\n", " doubles.append(numbers[i] * 2)\n", " else:\n", " doubles.append(numbers[i])\n", "\n", "# Pythonic\n", "doubles = [x * 2 if x % 2 == 0 else x for x in numbers]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Code style\n", "\n", "- [PEP8](https://www.python.org/dev/peps/pep-0008/) is the official style guide for Python:\n", " - use 4 spaces for indentation;\n", " - define a maximum value for line length (around 80 characters);\n", " - organize imports at beginning of file;\n", " - surround binary operators with a single space on each side;\n", " - ...\n", "- Code style should be enforced upon creation by a tool like [black](https://github.com/psf/black)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Beyond PEP8\n", "\n", "Focusing on style and PEP8-compliance might make you miss more fundamental code imperfections." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/jpeg": "\n", "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "YouTubeVideo(\"wf-BqAjZb8M\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Docstrings\n", "\n", "A [docstring](https://www.python.org/dev/peps/pep-0257/) is a string literal that occurs as the first statement in a module, function, class, or method definition to document it.\n", "\n", "All modules, classes, public methods and exported functions should include a docstring." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def complex(real=0.0, imag=0.0):\n", " \"\"\"Form a complex number.\n", "\n", " Keyword arguments:\n", " real -- the real part (default 0.0)\n", " imag -- the imaginary part (default 0.0)\n", " \"\"\"\n", " if imag == 0.0 and real == 0.0:\n", " return complex_zero" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Code linting\n", "\n", "- _Linting_ is the process of checking code for syntactical and stylistic problems before execution.\n", "- It is useful to catch errors and improve code quality in dynamically typed, interpreted languages, where there is no compiler.\n", "- Several linters exist in the Python ecosystem. The most commonly used is [pylint](https://pylint.org/)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Type annotations\n", "\n", "- Added in Python 3.5, [type annotations](https://www.python.org/dev/peps/pep-0484/) allow to add type hints to code entities like variables or functions, bringing a statically typed flavour to the language.\n", "- [mypy](http://mypy-lang.org/) can automatically check the code for annotation correctness." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def greeting(name: str) -> str:\n", " return \"Hello \" + name\n", "\n", "\n", "# greeting('Alice') # OK\n", "# greeting(3) # mypy error: incompatible type \"int\"; expected \"str\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Unit tests\n", "\n", "Unit tests automate the testing of individual code elements like functions or methods, thus decreasing the risk of bugs and regressions.\n", "\n", "They can be implemented in Python using tools like [unittest](https://docs.python.org/3/library/unittest.html) or [pytest](https://docs.pytest.org)." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def inc(x):\n", " return x + 1\n", "\n", "\n", "def test_answer():\n", " assert inc(3) == 5 # AssertionError: assert 4 == 5" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Packaging and dependency management" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Managing dependencies in Python\n", "\n", "- Most Python apps depend on third-party libraries and frameworks (NumPy, Flask, Requests...).\n", "- These tools may also have external dependencies, and so on.\n", "- **Dependency management** is necessary to prevent version conflicts and incompatibilities. it involves two things:\n", " - a way for the app to declare its dependencies;\n", " - a tool to resolve these dependencies and install compatible versions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Semantic versioning\n", "\n", "- Software versioning convention used in many ecosystems.\n", "- A version number comes as a suite of three digits `X.Y.Z`.\n", " - X = major version (potentially including breaking changes).\n", " - Y = minor version (only non-breaking changes).\n", " - Z = patch.\n", "- Digits are incremented as new versions are shipped." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### pip and requirements.txt\n", "\n", "A `requirements.txt` file is the most basic way of declaring dependencies in Python.\n", "\n", "```text\n", "certifi>=2020.11.0\n", "chardet==4.0.0\n", "click>=6.5.0, <7.1\n", "download==0.3.5\n", "Flask>=1.1.0\n", "```\n", "\n", "The [pip](https://pypi.org/project/pip/) package installer can read this file and act accordingly, downloading dependencies from [PyPI](https://pypi.org/).\n", "\n", "```bash\n", "pip install -r requirements.txt\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Virtual environments\n", "\n", "- A **virtual environment** is an isolated Python environment where a project's dependencies are installed.\n", "- Using them prevents the risk of mixing dependencies required by different projects on the same machine.\n", "- Several tools exist to manage virtual environments in Python, for example [virtualenv](https://virtualenv.pypa.io) and [conda](https://docs.conda.io)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### conda and environment.yml\n", "\n", "Installed as part of the [Anaconda](https://www.anaconda.com/) distribution, the [conda](https://docs.conda.io) package manager reads an `environment.yml` file to install the dependencies associated to a specific virtual environment.\n", "\n", "```yaml\n", "name: example-env\n", "\n", "channels:\n", " - conda-forge\n", " - defaults\n", "\n", "dependencies:\n", " - python=3.7\n", " - matplotlib\n", " - numpy\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Poetry\n", "\n", "[Poetry](https://python-poetry.org) is a recent packaging and dependency management tool for Python. It downloads packages from [PyPI](https://pypi.org/) by default.\n", "\n", "```bash\n", "# Create a new poetry-compliant project\n", "poetry new \n", "\n", "# Initialize an already existing project for Poetry\n", "poetry init\n", "\n", "# Install defined dependencies\n", "poetry install\n", "\n", "# Add a package to project dependencies and install it\n", "poetry add \n", "\n", "# Update dependencies to sync them with configuration file\n", "poetry update\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Poetry and virtual environments\n", "\n", "By default, Poetry creates a virtual environment for the configured project in a user-specific folder. A standard practice is to store it in the project's folder.\n", "\n", "```bash\n", "# Tell Poetry to store the environment in the local project folder\n", "poetry config virtualenvs.in-project true\n", "\n", "# Activate the environment\n", "poetry shell\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### The pyproject.toml file\n", "\n", "Poetry configuration file, soon-to-be standard for Python projects.\n", "\n", "```toml\n", "[tool.poetry]\n", "name = \"poetry example\"\n", "version = \"0.1.0\"\n", "description = \"\"\n", "\n", "[tool.poetry.dependencies]\n", "python = \">=3.7.1,<3.10\"\n", "jupyter = \"^1.0.0\"\n", "matplotlib = \"^3.3.2\"\n", "sklearn = \"^0.0\"\n", "pandas = \"^1.1.3\"\n", "ipython = \"^7.0.0\"\n", "\n", "[tool.poetry.dev-dependencies]\n", "pytest = \"^6.1.1\"\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Caret requirements\n", "\n", "Offers a way to precisely define dependency versions.\n", "\n", "| Requirement | Versions allowed |\n", "| :---------: | :--------------: |\n", "| ^1.2.3 | >=1.2.3 <2.0.0 |\n", "| ^1.2 | >=1.2.0 <2.0.0 |\n", "| ~1.2.3 | >=1.2.3 <1.3.0 |\n", "| ~1.2 | >=1.2.0 <1.3.0 |\n", "| 1.2.3 | 1.2.3 only |" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### The poetry.lock file\n", "\n", "- The first time Poetry install dependencies, it creates a `poetry.lock` file that contains the exact versions of all installed packages.\n", "- Subsequent installs will use these exact versions to ensure consistency.\n", "- Removing this file and running another Poetry install will fetch the latest matching versions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Working with notebooks" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Advantages of Jupyter notebooks\n", "\n", "- Standard format for mixing text, images and (executable) code.\n", "- Open source and platform-independant.\n", "- Useful for experimenting and prototyping.\n", "- Growing ecosystem of [extensions](https://tljh.jupyter.org/en/latest/howto/admin/enable-extensions.html) for various purposes and cloud hosting solutions ([Colaboratory](https://colab.research.google.com/), [AI notebooks](https://www.ovhcloud.com/en/public-cloud/ai-notebook/)...).\n", "- Integration with tools like [Visual Studio Code](https://code.visualstudio.com/docs/datascience/jupyter-notebooks)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Drawbacks of Jupyter notebooks\n", "\n", "- Arbitrary execution order of cells can cause confusing errors.\n", "- Notebooks don't encourage good programming habits like modularization, linting and tests.\n", "- Being JSON-based, their versioning is more difficult than for plain text files.\n", "- Dependency management is also difficult, thus hindering reproducibility." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Collaborating with notebooks\n", "\n", "A common solution for sharing notebooks between a team is to use [Jupytext](https://jupytext.readthedocs.io). This tool can associate an `.ipynb` file with a Python file to facilitate collaboration and version control.\n", "\n", "[![Collaboration example through Jupytext](images/JupyterPyCharm.gif)](https://jupytext.readthedocs.io/en/latest/examples.html)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Code organization\n", "\n", "Monolithic notebooks can grow over time and become hard to understand and maintain.\n", "\n", "Just like in a traditional software project, it is possible to split them into separate parts, thus following the [separation of concerns](https://en.wikipedia.org/wiki/Separation_of_concerns) design principle.\n", "\n", "Code can be splitted into several sub-notebooks and/or external Python files. The latter facilitates unit testing and version control." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Notebook workflow\n", "\n", "Tools like [papermill](https://papermill.readthedocs.io) can orchestrate the execution of several notebooks in a row. External parameters can be passed to notebooks, and the runtime flow can depend on the execution results of each notebook." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c0a5ec58377a4f3fbfd3504d6862593b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='Executing', max=4, style=ProgressStyle(description_width='iniā€¦" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Doesn't work on Google Colaboratory. Workaround here: \n", "# https://colab.research.google.com/github/rjdoubleu/Colab-Papermill-Patch/blob/master/Colab-Papermill-Driver.ipynb\n", "notebook_dir = \"./papermill\"\n", "result = pm.execute_notebook(\n", " os.path.join(notebook_dir, \"simple_input.ipynb\"),\n", " os.path.join(notebook_dir, \"simple_output.ipynb\"),\n", " parameters={\"msg\": \"Hello\"},\n", ")" ] } ], "metadata": { "celltoolbar": "Diaporama", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 4 }