{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 4: MPI for Python\n",
    "\n",
    "MPI is the __Message Passing Interface__ - a standard used for parallel programming involving communication between separate parallel processes each with their own separate memory allocation. MPI processes have to pass messages between themselves to invoke code execution and share data between with each other.  \n",
    "\n",
    "MPI is commonly used in distributed memory systems, computer systems composed of computer nodes, each with their own separate physical memory, such as high-performance computers or cluster machines. Equally though, MPI will run just as well on your laptop computer and is not tied to any particular architecture. \n",
    "\n",
    "In this mini tutorial we introduce the `mpi4py` Python package, which provides Python bindings to the MPI libraries, similar to the corresponding C, C++, and Fortran MPI APIs.\n",
    "\n",
    "If you are familiar with existing C/C++ MPI, you will find picking up the syntax in the Python bindings very similar. However, don't worry if you are completely new to MPI - we will introduce some of the simpler concepts from a Python perspective to get you started. However, unlike the previous parts of the tutorial, _the Python version of mpi4py does little to hide away the underlying parallel library. (In contrast to numba or Cython, which use OpenMP or OpenMP-like libraries, but hide the details away from the user moreso.)_\n",
    "\n",
    "If you'd prefer to learn more about the basics of MPI before starting this tutorial, Lawrence Livermore National Lab have a good introductory tutorial that covers the concepts well: https://computing.llnl.gov/tutorials/mpi/\n",
    "\n",
    "\n",
    "## Following this tutorial\n",
    "\n",
    "Jupyter notebooks do not work particularly well with `mpi4py` (It is possible to set them up to work together, but that is beyond the scope of this tutorial.) We will demonstrate a few simple examples in this notebook, but to actually run the code it is easier to copy or type out the examples into a Python script and save it as a `.py` file. You would then execute the python script using the mpi launcher, usually called `mpirun` on most systems. For example to run a sample `mpi4py` python script over 4 processes, we would run the python script like so:\n",
    "\n",
    "```\n",
    "mpirun -np 4 python mpi_python_script.py\n",
    "```\n",
    "\n",
    "To get started with any mpi4py script, we would add to our Python code the following import statement:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mpi4py import MPI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then a simple test of `mpi4py` would print \"Hello world!\" or similar from each MPI process:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello! I'm rank 0 from 1 running in total...\n"
     ]
    }
   ],
   "source": [
    "from mpi4py import MPI\n",
    "\n",
    "comm = MPI.COMM_WORLD\n",
    "\n",
    "print(\"Hello! I'm rank {} from {} running in total...\".format(comm.rank, comm.size))\n",
    "\n",
    "# Wait for everyone to syncronize here:\n",
    "comm.Barrier()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So what's going on here?\n",
    "\n",
    "Firstly we set up a `comm` object, which is a communicator that lets us get information about and talk to the other processes. (You will see it in most MPI programs.)\n",
    "\n",
    "In the print function, we make a call to `comm.rank`, which gets the ID of the current process in the communicator (its 'rank') and then `comm.size` which gives is the total number of running processes in the communicator object.\n",
    "\n",
    "At the end we have to call `comm.Barrier` to synchronise the processes in the communicator.\n",
    "\n",
    "Unfortunately, if running this in a jupyter notebook, we will only likely see rank 0 and 1 total process, because of the way Python is running inside a single process. To see the effects of multiple MPI processes, we would ideally run the above code as a script from the terminal as outlined above, i.e. `mpirun -np 4 python mpi_hello.py`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Point-to-Point communication example\n",
    "\n",
    "This example looks at how we would send a Python object from one process to another."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "from mpi4py import MPI\n",
    "import numpy\n",
    "\n",
    "comm = MPI.COMM_WORLD\n",
    "rank = comm.Get_rank()\n",
    "\n",
    "if rank == 0:\n",
    "    data = {'a': 7, 'b': 3.14}\n",
    "    comm.send(data, dest=1)\n",
    "elif rank == 1:\n",
    "    data = comm.recv(source=0)\n",
    "    print('On process 1, data is ',data)\n",
    "```\n",
    "\n",
    "As usual, we set up the `comm` object to begin with. Then we fetch the rank (ID) of each process. If we are on rank 0, we will use this process as the master process and initialise some data here (a Python dictionary object). Then we use `comm.send()` to send the Python data object to a particular process. \n",
    "\n",
    "If we are executing the other MPI process (rank 1), we are going to set up our receiving rank. We have to explicity say we want to receive the data from rank 0. We also tell this rank=1 process to print out the dictionary we have received. \n",
    "\n",
    "To run this, create a separate python script with the above code and run it with:\n",
    "\n",
    "`mpirun -np 2 python point_mpi.py`\n",
    "\n",
    "The output should tell us that we have received the dictionary data on process 1:\n",
    "\n",
    "```\n",
    "On process 1, data is  {'a': 7, 'b': 3.14}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Broadcasting example\n",
    "\n",
    "Broadcasting is another feature in MPI where a copy of some data is sent to every other MPI process (to be used to perform some further calculation on it by each process for example).\n",
    "\n",
    "Another example that demonstrates the interface with MPI is broadcasting a vector to `n` MPI processes.\n",
    "\n",
    "In this example, we are going to take a 1D numpy array and _broadcast_ it from rank 0 to the other ranks (processes). \n",
    "\n",
    "_**Note**: Perhaps confusingly MPI and NumPy both use the term __broadcasting__ to mean different things._"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "------------------------------------------------------------------------------\n",
      " Running on 1 cores\n",
      "------------------------------------------------------------------------------\n",
      "[00] [0. 1. 2. 3. 4.]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "from mpi4py import MPI\n",
    "\n",
    "comm = MPI.COMM_WORLD\n",
    "\n",
    "print(\"-\"*78)\n",
    "print(\" Running on %d cores\" % comm.size)\n",
    "print(\"-\"*78)\n",
    "\n",
    "comm.Barrier()\n",
    "\n",
    "# Prepare a vector of N=5 elements to be broadcasted...\n",
    "N = 5\n",
    "if comm.rank == 0:\n",
    "    A = np.arange(N, dtype=np.float64)    # rank 0 has the proper data\n",
    "else:\n",
    "    A = np.empty(N, dtype=np.float64)     # all other ranks just an empty array\n",
    "\n",
    "# Broadcast array A from rank 0 to everybody\n",
    "comm.Bcast( [A, MPI.DOUBLE] )\n",
    "# The list argument contains the array to be broadcast and the corresponding\n",
    "# MPI data type: MPI.DOUBLE\n",
    "\n",
    "# Everybody should now have the same...\n",
    "print(\"[%02d] %s\" % (comm.rank, A))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note how if you are running this in a Jupyter notebook, you are only going to see `1 core` reported. This is because we are running the python code through a standard Python interpreter, rather than invoking it with a command like `mpirun`. To see the effects of this with multiple MPI processes running at once, you need to run the code above in a script called `broadcast_mpi.py` from the command line with:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`mpirun -np 4 python broadcast_mpi.py`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This command now invokes the `python` interpreter through the `mpirun` executable, allowing the mpi4py python library to interact with the MPI API, and run on multiple cores on a multicore CPU.\n",
    "\n",
    "The output (from the terminal) should now look something like:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "------------------------------------------------------------------------------\n",
    " Running on 4 cores\n",
    "------------------------------------------------------------------------------\n",
    "[00] [0. 1. 2. 3. 4.]\n",
    "------------------------------------------------------------------------------\n",
    " Running on 4 cores\n",
    "------------------------------------------------------------------------------\n",
    "[01] [0. 1. 2. 3. 4.]\n",
    "------------------------------------------------------------------------------\n",
    " Running on 4 cores\n",
    "------------------------------------------------------------------------------\n",
    "[02] [0. 1. 2. 3. 4.]\n",
    "------------------------------------------------------------------------------\n",
    " Running on 4 cores\n",
    "------------------------------------------------------------------------------\n",
    "[03] [0. 1. 2. 3. 4.]\n",
    "```\n",
    "\n",
    "Note how now each process has received a copy of the same array from the broadcasting operation. In a real program, we might then proceed to mainpulate this array or perform calculations on it. \n",
    "\n",
    "\n",
    "## MPI Pi example\n",
    "\n",
    "In the final example, we set up a a parent process which spawns its own children MPI processes that will run a script to calculate a partial value of pi. The parent process then calls a reduction (`comm.Reduce`) which will sum up all the partial values of pi from each child process, giving the total estimate for pi. \n",
    "\n",
    "https://en.wikipedia.org/wiki/Leibniz_formula_for_%CF%80\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "# parent_pi.py\n",
    "from mpi4py import MPI\n",
    "import numpy\n",
    "import sys\n",
    "\n",
    "comm = MPI.COMM_SELF.Spawn(sys.executable,\n",
    "                           args=['child_pi.py'],\n",
    "                           maxprocs=5)\n",
    "\n",
    "N = numpy.array(100, 'i')\n",
    "comm.Bcast([N, MPI.INT], root=MPI.ROOT)\n",
    "PI = numpy.array(0.0, 'd')\n",
    "comm.Reduce(None, [PI, MPI.DOUBLE],\n",
    "            op=MPI.SUM, root=MPI.ROOT)\n",
    "print(PI)\n",
    "\n",
    "comm.Disconnect()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "# child_pi.py\n",
    "from mpi4py import MPI\n",
    "import numpy\n",
    "\n",
    "comm = MPI.Comm.Get_parent()\n",
    "size = comm.Get_size()\n",
    "rank = comm.Get_rank()\n",
    "\n",
    "N = numpy.array(0, dtype='i')\n",
    "comm.Bcast([N, MPI.INT], root=0)\n",
    "\n",
    "h = 1.0 / N; s = 0.0\n",
    "for i in range(rank, N, size):\n",
    "    x = h * (i + 0.5)\n",
    "    s += 4.0 / (1.0 + x**2)\n",
    "PI = numpy.array(s * h, dtype='d')\n",
    "\n",
    "comm.Reduce([PI, MPI.DOUBLE], None,\n",
    "            op=MPI.SUM, root=0)\n",
    "\n",
    "comm.Disconnect()\n",
    "```\n",
    "\n",
    "To run this code, make sure you have created the two scripts with the names `parent_pi.py` and `child_pi.py` and placed them in the same folder. Then start mpirun but with only 1 process - the `Spawn` method will create new MPI processes that run the `child_pi.py` script.\n",
    "\n",
    "`mpirun -np 1 python parent_pi.py`\n",
    "\n",
    "The output should return a reasonable value for Pi. *Hint: you can get a more/less accurate value for pi by changing the value of `N` in the `parent_pi.py` part of the solution.*\n",
    "\n",
    "Note how in the examples there are slightly different ways for sending Python objects compared to sending NumPy arrays. Also when using numpy arrays with MPI functions, we have to specific the data-type in the list argument that is passed to the MPI function. e.g.\n",
    "\n",
    "```\n",
    "comm.Bcast([N, MPI.INT], ...\n",
    "```\n",
    "\n",
    "whereas for a Pytohn dictionary object, this would be:\n",
    "\n",
    "```\n",
    "comm.bcast(my_dict, ...\n",
    "```\n",
    "\n",
    "(Note also the lowercase `bcast`)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary and taking it further\n",
    "\n",
    "**mpi4py** is a Python interface to the MPI library for message-passing parallel programming. It provides an interface to the powerful Message-Passing Interface standard, a parallel programming standard commonly used in distribute memory parallel programming.\n",
    "\n",
    "In _Part 1: Multiprocessing_, you may recall we discussed the `multiprocessing` module, another pytohn module that creates separate processes and can be used to distribute parallel tasks. Multiprocessing is limited to creating OS-level processes on a shared-memory computing environment, and (to my knowledge) is not easy to apply across multi-node cluster computers, something which MPI (and by extension `mpi4py`) was designed specifically to do.\n",
    "\n",
    "So if your problem size requires the resources of larger compute clusters, `mpi4py` may be the more appropriate choice, though the learning curve is inevitably steeper due to requiring knowledge of the MPI standard to get the most out of the Python implementation.\n",
    "\n",
    "### Useful resources for taking it further\n",
    "\n",
    "Introductory MPI texts and tutorials would be just as useful for an overview of MPI in greater depth. The current documentation for mpi4py, although good, does tend to assume some knowledge of the concepts behind message passing and distributed memory programming. Therefore, starting with a basic MPI tutorial may be a good first move. E.g.: http://mpitutorial.com/tutorials/ is a nice set of introductory tutorials.\n",
    "\n",
    "MPI for Python documentation: https://mpi4py.readthedocs.io/en/stable/tutorial.html\n",
    "\n",
    "Excellent Python MPI tutorial (goes into more depth) https://nyu-cds.github.io/python-mpi/\n",
    "\n",
    "Another good tutorial based on the mpi4py documentation: https://rabernat.github.io/research_computing/parallel-programming-with-mpi-for-python.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}