{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Why Python?\n", "\n", "Python is now the second most popular language on GitHub, after only JavaScript.\n", "\n", "![GitHub Languages](./img/GitHubLang.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Jupyter notebooks growth over 100% every year for the last three years! (in 2019)\n", "* Still in the top 10 growing languages! (in 2022)\n", "\n", "![GitHub Language Growth](./img/FastestGrowing.png)\n", "\n", "[State of the Octoverse, 2019 - 2022](https://octoverse.github.com)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Why Python?\n", "\n", "![PyPI Languages](./img/PYPLLang.png)\n", "\n", "[PyPL rankings](http://pypl.github.io/PYPL.html) of some of the most popular languages for data science. Quote: \"Worldwide, Python is the most popular language, Python grew the most in the last 5 years (6.9%)\" (March 2023)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Timeline of Python\n", "\n", "* 1994: Python 1.0 released\n", "* 1995: First array package: Numeric\n", "* 2003: Matplotlib\n", "* 2005: Numeric and numarray merged into NumPy\n", "* 2008: Pandas introduced, Python 3 released\n", "* 2012: The Anaconda python distribution was born\n", "* 2014: IPython produces the Jupyter project and notebook\n", "* 2016: LIGO's discovery was shown in a Jupyter Notebook, and was written in Python\n", "* 2017: Google releases TensorFlow\n", "* 2019: All Machine Learning libraries are primarily or exclusively used through Python\n", "* 2020: Python 2 died, long live Python 3.6+!\n", "* 2022: The faster CPython project provides 25% speedup in 3.12!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Timeline of Python, key points\n", "\n", "\n", "## 2005: NumPy\n", "* Merged two competing codebases, created single ecosystem\n", "\n", "## 2008: Pandas\n", "* Took on specialized statistics languages (like R) with a *library* in a general purpose language\n", "* Pioneered \"Pythonic\" shortcuts, breaking down traditional design barriers\n", "\n", "## 2014: Jupyter\n", "* The notebook format, with code, outputs, and descriptions interleaved, became multilingual" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Python vs. a compiled language\n", "\n", "Python is an interpreted language. When we talk about Python, we usually mean CPython, which is not even Just In Time (JIT) compiled; it's purely interpreted.\n", "\n", "TLDR: Python is *slow*.\n", "\n", "Hundreds to thousands of times slower than C/C++/Fortran/Go/Swift/Rust/Haskell... You get the point.\n", "\n", "Python is like a car. Compiled languages are like a plane.\n", "\n", "So why use it?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A hybrid approach\n", "\n", "If you want to get to South America, the fastest way to do so is take a car to get to the airport to catch a plane. \n", "\n", "Same idea for Python and compiled languages. You can do the big, common, easy tasks in compiled languages, and steer it with Python.\n", "\n", "And, as you'll see today, that's easier than you think!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Mini-courses\n", "\n", "\n", "## High Performance Python: CPU\n", "\n", "* Today's class\n", "* How to make Python code fast *without* fully leaving Python\n", "\n", "\n", "## [High Performance Python: GPU](https://github.com/henryiii/pygpu-minicourse)\n", "\n", "* Using a GPU to accelerate code\n", "* Using accelerators to boost your code\n", "\n", "\n", "## [Compiled code & Python](https://github.com/henryiii/python-compiled-minicourse)\n", "\n", "* How to interface and accelerate with compiled code (mostly C++)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Lessons\n", "\n", "* [00 Intro](./00_intro.ipynb): The introduction\n", "* [01 Fractal accelerate](./01_fractal_accelerate.ipynb): A look at a fractal computation, and ways to accelerate it with NumPy changes, numexpr, and numba.\n", " - [01b Fractal interactive](./01b_fractal_interactive.ipynb): An interactive example using Numba.\n", "* [02 Temperatures](./02_temperatures.ipynb): A look at reading files and array manipulation in NumPy and Pandas.\n", "* [03 MCMC](./03_mcmc.ipynb): A Marco Chain Monte Carlo generator (and metropolis generator) in Python and Numba, with a focus on profiling.\n", "* [04 Runge-Kutta](./04_runge_kutta.ipynb): Implementing a popular integration algorithm in NumPy and Numba.\n", "* [05 Distributed](./05_distributed.ipynb): An exploration of ways to break up code (fractal) into chunks for multithreading, multiproccessing, and Dask distribution.\n", "* [06 Jax](./06_jax.ipynb): A look at Google's JAX.\n", " - [06b Jax](./06b_jax.ipynb): More JAX.\n", "* [07 Callables](./07_callables.ipynb): A look at Scipy's LowLevelCallable, and how to implement one with Numba.\n", "* [08 Pandas COVID data](./08_pandas_covid.ipynb): A further look at Pandas for a COVID dataset.\n", "\n", "We may not go through these in order." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Survey\n", "\n", "Before we finish, please complete the survey. We will give you some time near the end to fill it out." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Background\n", "\n", "> 5 minute pause: Please look through the following text, or ask for help getting setup. We will go over this quickly after the pause. Most of it should be review, except for some ufunc specifics.\n", "\n", "Python lists/tuples can contain any Python object, and so waste memory and layout:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import math\n", "\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lst = [1, \"hi\", 3.0, \"🤣\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Each* Python object stores *at least* a type and a reference count. They can be different sizes, so Python has to chase pointers down to get them. NumPy introduced an array class:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.array([1, 2, 3, 4])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The array object is a normal Python object (with refcounts and such), but the items *inside it* are stored nicely packed in memory, with a single \"dtype\" for all the data. You can use `dtype=object`, but if it is anything else, this is much nicer than pure Python for larger amounts of data. All the standard datatypes are present, rather than the simple 64-bit `float` and unlimited `int` that regular Python provides." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy provides \"array\" processing, where operations and functions are applied to arrays rather than in loops, and this allows the operations to usually loop in a compiled language, skipping the type lookups and such Python would have to do. To facilitate this, NumPy introduced UFuncs, Generalized UFuncs, and functions that operate on arrays. They also helped Python 3 come up with a memory buffer interface for communicating data structure between libraries without NumPy, and an overload system for UFuncs (1.13) and later array functions (1.18).\n", "\n", "Out of all of that, let's peak at a UFunct:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vals = np.linspace(0, np.pi, 9)\n", "\n", "# Ufunc: np.sin\n", "print(np.sin(vals))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`np.sin` is a UFunc. It can be called on any dimension of array, and it will return the same dimensionality array, with the function (`sin`, in this case) transforming each element. If it took multiple arguments, each could be ND, and the output would be the broadcast combination of the inputs (fails if not compatible). There are a set of standard arguments, such as `out=` (use an existing array for the output), `where=` (mask items), `casting`, `order`, `dtype`, and `subok`. You can also call a set of standard methods, such as `accumulate`, `at`, `outer`, `reduce`, and `reduceat` - though some do not work on all ufuncs. There are some properties, too." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use `out=` to pre-allocate our own output:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vals = np.linspace(0, np.pi, 9)\n", "out = np.empty_like(vals)\n", "np.sin(vals, out=out)\n", "print(out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The operators on arrays, along with most of the methods on arrays, are actually ufuncts and array functions defined elsewhere in NumPy:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "out_simple = vals + vals\n", "\n", "out_inplace = np.empty_like(vals)\n", "np.add(vals, vals, out=out_inplace)\n", "\n", "np.testing.assert_array_equal(out_simple, out_inplace)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will consider the simple form of this, array manipulation with the simple operations, to be the baseline. There is a \"simpler\" baseline, or maybe just an older one, of loops over arrays. I *think* most people who learn Python today or in the last few years start quite early with array programming, and that is the one most familiar, so we will start there." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Array looping method, do not use\n", "\n", "vals = np.linspace(0, np.pi, 9)\n", "out = []\n", "for val in vals:\n", " out.append(math.sin(val))\n", "print(out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Interesting projects\n", "\n", "I am part of [Scikit-HEP](http://scikit-hep.org), a project to build tools for High Energy Physicists in Python. Some of the projects are applicable outside of HEP:\n", "\n", "* [AwkwardArray](https://github.com/scikit-hep/awkward-array): Jagged array structures\n", "* [Vector](https://github.com/scikit-hep/vector): A package for 2D, 3D, and Lorentz vectors\n", "* [boost-histogram](https://github.com/scikit-hep/boost-histogram): A compiled package for powerful, fast histograms in Python\n", " - [hist](https://github.com/scikit-hep/hist), a package for fast analysis and plotting of histograms (in development)\n", "* [iMinuit](https://github.com/scikit-hep/iminuit): A powerful minimization package (used in HEP and Astrophysics)\n", "\n", "Other projects I am a developer on:\n", "\n", "* [scikit-build](https://github.com/scikit-build): A build backend for CMake code in Python\n", "* [pybind11](https://github.com/pybind/pybind11): Python Bindings in pure C++11+, no other tool needed!\n", "* [build](https://github.com/pypa/build): Build wheels and SDists for Python.\n", "* [cibuildwheel](https://github.com/pypa/cibuildwheel): Build redistributable binary wheels for Python!\n", "* [Plumbum](https://plumbum.readthedocs.io/en/latest/): A toolkit for bash-like scripting in Python\n", "* [CLI11](https://github.com/CLIUtils/CLI11): A command line parser for C++11" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Further reading\n", "\n", "## My Materials\n", "\n", "### Favorite posts and series\n", "\n", "[C++](https://iscinumpy.dev/tags/cppxx) [11](https://iscinumpy.dev/post/cpp-11) [14](https://iscinumpy.dev/post/cpp-14) [17](https://iscinumpy.dev/post/cpp-17) [20](https://iscinumpy.dev/post/cpp-20) •\n", "[macOS Setup](https://iscinumpy.dev/post/setup-a-new-mac) [(AS)](https://iscinumpy.dev/post/setup-apple-silicon) •\n", "[Azure DevOps](https://iscinumpy.dev/categories/azure-devops) ([Python Wheels](https://iscinumpy.dev/post/azure-devops-python-wheels)) •\n", "[Conda-Forge ROOT](https://iscinumpy.dev/post/root-conda) •\n", "[CLI11](https://iscinumpy.dev/tags/cli11) •\n", "[GooFit](https://iscinumpy.dev/tags/goofit) •\n", "[cibuildwheel](https://iscinumpy.dev/tags/cibuildwheel) •\n", "[Hist](https://iscinumpy.dev/tags/hist) •\n", "[Python Bindings](https://iscinumpy.dev/tags/bindings) •\n", "[Python 2→3](https://iscinumpy.dev/post/python-3-upgrade), [3.7](https://iscinumpy.dev/post/python-37), [3.8](https://iscinumpy.dev/post/python-38), [3.9](https://iscinumpy.dev/post/python-39), [3.10](https://iscinumpy.dev/post/python-310), [3.11](https://iscinumpy.dev/post/python-311) •\n", "[SSH](https://iscinumpy.dev/post/setting-up-ssh-forwarding/)\n", "\n", "### My classes and books\n", "\n", "[Modern CMake](https://cliutils.gitlab.io/modern-cmake/) •\n", "[CompClass](https://henryiii.github.io/compclass) •\n", "[se-for-sci](https://henryiii.github.io/se-for-sci)\n", "\n", "### My workshops\n", "\n", "[CMake Workshop](https://hsf-training.github.io/hsf-training-cmake-webpage/) •\n", "Python [CPU](https://github.com/henryiii/python-performance-minicourse), [GPU](https://github.com/henryiii/pygpu-minicourse), [Compiled](https://github.com/henryiii/python-compiled-minicourse) minicourses •\n", "[Level Up Your Python](https://henryiii.github.io/level-up-your-python) •\n", "[Packaging (WIP)](https://intersect-training.org/packaging/)\n", "\n", "### My projects\n", "\n", "[pybind11](https://pybind11.readthedocs.io) ([python_example](https://github.com/pybind/python_example), [cmake_example](https://github.com/pybind/cmake_example), [scikit_build_example](https://github.com/pybind/scikit_build_example)) •\n", "[cibuildwheel](https://cibuildwheel.readthedocs.io) •\n", "[build](https://pypa-build.readthedocs.io) •\n", "[scikit-build](https://github.com/scikit-build/scikit-build) ([core](https://github.com/scikit-build/scikit-build-core), [cmake](https://github.com/scikit-build/cmake-python-distributions), [ninja](https://github.com/scikit-build/ninja-python-distributions), [moderncmakedomain]()) •\n", "[boost-histogram](https://github.com/scikit-hep/boost-histogram) •\n", "[Hist](https://github.com/scikit-hep/hist) •\n", "[UHI](https://github.com/scikit-hep/uhi) •\n", "[Scikit-HEP/cookie](https://github.com/scikit-hep/cookie) •\n", "[Vector](https://github.com/scikit-hep/vector) •\n", "[CLI11](https://github.com/CLIUtils/CLI11) •\n", "[Plumbum](https://plumbum.readthedocs.io/en/latest) •\n", "[GooFit](https://github.com/GooFit/GooFit) •\n", "[Particle](https://github.com/scikit-hep/particle) •\n", "[DecayLanguage](https://github.com/scikit-hep/decaylanguage) •\n", "[Conda-Forge ROOT](https://github.com/conda-forge/root-feedstock) •\n", "[POVM](https://github.com/Princeton-Penn-Vents/princeton-penn-flowmeter) •\n", "[Jekyll-Indico](https://github.com/iris-hep/jekyll-indico) •\n", "[pytest GHA annotate-failures](https://github.com/utgwkk/pytest-github-actions-annotate-failures) •\n", "[uproot-browser](https://github.com/scikit-hep/uproot-browser) •\n", "[Scikit-HEP-repo-review](https://github.com/scikit-hep/repo-review) •\n", "[meson-python](https://github.com/mesonbuild/meson-python) •\n", "[flake8-errmsg](https://github.com/henryiii/flake8-errmsg) •\n", "[beautifulhugo](https://github.com/halogenica/beautifulhugo)\n", "\n", "\n", "### My sites\n", "\n", "[ISciNumPy](https://iscinumpy.dev) •\n", "[IRIS-HEP](https://iris-hep.org) •\n", "[Scikit-HEP](https://scikit-hep.org) ([Developer pages](https://scikit-hep.org/developer)) •\n", "[CLARIPHY](https://clariphy.org)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## Jim Pivarski's materials\n", "\n", "Jim taught earlier iterations of this mini-course, and his materials are great:\n", "\n", "* [Mini-course Fall 2018](https://github.com/jpivarski/python-numpy-mini-course)\n", "* [Mini-course Spring 2019](https://github.com/jpivarski/2019-04-08-picscie-numpy)\n", "* [CoDaS HEP Summer 2019](https://github.com/jpivarski/2019-07-23-codas-hep)\n", "* [DPF Summer 2019](https://github.com/jpivarski/2019-07-29-dpf-python)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" }, "vscode": { "interpreter": { "hash": "2d4e9b9c84dab3e1662173f95b81bd7f8a551068d04f5f3c42d164db7312a928" } } }, "nbformat": 4, "nbformat_minor": 4 }