{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Tutorial 16: Poisson equation on parallel distributed-memory computers"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Introduction"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "In this tutorial we will learn how to use [`GridapDistributed.jl`](https://github.com/gridap/GridapDistributed.jl) and its satellite packages, [`GridapP4est.jl`](https://github.com/gridap/GridapP4est.jl), [`GridapGmsh.jl`](https://github.com/gridap/GridapGmsh.jl), and [`GridapPETSc.jl`](https://github.com/gridap/GridapPETSc.jl), in order to solve a Poisson PDE problem  on the unit square using grad-conforming Lagrangian Finite Elements for numerical discretization."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "We will first solve the problem using solely the built-in tools in `GridapDistributed.jl`. While this is very useful for testing and debugging purposes, `GridapDistributed.jl` is *not* a library of parallel solvers. Indeed, the built-in linear solver kernel within `GridapDistributed.jl`, defined with the backslash operator `\\`, is just a sparse LU solver applied to the global system gathered on a master task (thus not scalable). To address this, we will then illustrate which changes are required in the program to replace the built-in solver in `GridapDistributed.jl` by `GridapPETSc.jl`. This latter package provides the full set of scalable linear and nonlinear solvers in the [PETSc](https://petsc.org/release/) numerical package."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "On the other hand, in real-world applications, one typically needs to solve PDEs on more complex domains than simple boxes. To this end, we can leverage either `GridapGmsh.jl`, in order to partition and distribute automatically unstructured meshes read from disk in gmsh format, or `GridapP4est.jl`, which allows one to mesh in a very scalable way computational domains which can be decomposed as forests of octrees. The last part of the tutorial will present the necessary changes in the program in order to use these packages."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "**IMPORTANT NOTE**: the parallel codes in this tutorial depend on the Message Passing Interface (MPI). Thus, they cannot be easily executed interactively, e.g., in a Jupyter notebook. Instead, one has to run them from a terminal using the [`mpiexecjl`](https://juliaparallel.github.io/MPI.jl/stable/configuration/#Julia-wrapper-for-mpiexec) script as provided by [MPI.jl](https://github.com/JuliaParallel/MPI.jl), e.g., with the command `mpiexecjl --project=. -n 4 julia src/poisson_distributed.jl` run from the root directory of the Tutorials git repository."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## First example: `GridapDistributed.jl` built-in tools"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "using Gridap\n",
    "using GridapDistributed\n",
    "using PartitionedArrays"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "The first step in any `GridapDistributed.jl` program is to define a function (named `main_ex1` below) to be executed on each part on which the domain is distributed. This function receives two arguments, `rank_partition` and `distribute`. The former is the process grid layout, `(2,2)` in this case, and the latter is a function that creates a distributed array with the identifiers of the parallel processes. The body of this function is equivalent to a sequential `Gridap` script, except for the `CartesianDiscreteModel` call, which in `GridapDistributed` also requires the `parts` and `rank_partition` arguments passed to the `main_ex1` function. The domain is discretized using the parallel Cartesian-like mesh generator built-in in `GridapDistributed`."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "function main_ex1(rank_partition,distribute)\n",
    "  parts  = distribute(LinearIndices((prod(rank_partition),)))\n",
    "  domain = (0,1,0,1)\n",
    "  mesh_partition = (4,4)\n",
    "  model = CartesianDiscreteModel(parts,rank_partition,domain,mesh_partition)\n",
    "  order = 2\n",
    "  u((x,y)) = (x+y)^order\n",
    "  f(x) = -Δ(u,x)\n",
    "  reffe = ReferenceFE(lagrangian,Float64,order)\n",
    "  V = TestFESpace(model,reffe,dirichlet_tags=\"boundary\")\n",
    "  U = TrialFESpace(u,V)\n",
    "  Ω = Triangulation(model)\n",
    "  dΩ = Measure(Ω,2*order)\n",
    "  a(u,v) = ∫( ∇(v)⋅∇(u) )dΩ\n",
    "  l(v) = ∫( v*f )dΩ\n",
    "  op = AffineFEOperator(a,l,U,V)\n",
    "  uh = solve(op)\n",
    "  writevtk(Ω,\"results_ex1\",cellfields=[\"uh\"=>uh,\"grad_uh\"=>∇(uh)])\n",
    "end"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Once the `main_ex1` function has been defined, we have to trigger its execution on the different parts. To this end, one calls the `with_mpi` function of [`PartitionedArrays.jl`](https://github.com/fverdugo/PartitionedArrays.jl) right at the beginning of the program."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "rank_partition = (2,2)\n",
    "with_mpi() do distribute\n",
    "  main_ex1(rank_partition,distribute)\n",
    "end"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "The `with_mpi(f)` function receives one function (defined in-situ using Julia's do-block function call syntax here) assumed to have a single argument, the `distribute` function (see above). This function is called from `with_mpi(f)` and executed on each part. It in turn calls the `main_ex1` function, which does the actual work."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Although not illustrated in this tutorial, we note that one may also use the `with_debug(f)` `PartitionedArrays.jl` function, instead of `with_mpi(f)`. With this function, the code executes serially on a single process (and there is thus no need to use `mpiexecjl` to launch the program), although  the data structures are still partitioned into parts. This is very useful, among others, for interactive execution of the code, and debugging, before moving to MPI parallelism."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Second example: `GridapDistributed.jl` + `GridapPETSc.jl` for the linear solver"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "using GridapPETSc"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "In this example we use `GridapPETSc.jl` to have access to a scalable linear solver. The code is almost identical as the one above (see below). The main difference is that now we are wrapping most of the code of the `main_ex2` function within a do-block syntax function call to the `GridapPETSc.with(args=split(options))` function. The `with` function receives as a first argument a function with no arguments with the instructions to be executed on each MPI task/subdomain (that we pass to it as an anonymous function with no arguments), along with the `options` to be passed to the PETSc linear solver. For a detailed explanation of possible options we refer to the PETSc library documentation. Note that the call to `PETScLinearSolver()` initializes the PETSc solver with these `options` (even though `options` is not actually passed to the linear solver constructor). Besides, we have to pass the created linear solver object `solver` to the `solve` function to override the default linear solver (i.e., a call to the backslash `\\` Julia operator)."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "function main_ex2(rank_partition,distribute)\n",
    "  parts  = distribute(LinearIndices((prod(rank_partition),)))\n",
    "  options = \"-ksp_type cg -pc_type gamg -ksp_monitor\"\n",
    "  GridapPETSc.with(args=split(options)) do\n",
    "    domain = (0,1,0,1)\n",
    "    mesh_partition = (4,4)\n",
    "    model = CartesianDiscreteModel(parts,rank_partition,domain,mesh_partition)\n",
    "    order = 2\n",
    "    u((x,y)) = (x+y)^order\n",
    "    f(x) = -Δ(u,x)\n",
    "    reffe = ReferenceFE(lagrangian,Float64,order)\n",
    "    V = TestFESpace(model,reffe,dirichlet_tags=\"boundary\")\n",
    "    U = TrialFESpace(u,V)\n",
    "    Ω = Triangulation(model)\n",
    "    dΩ = Measure(Ω,2*order)\n",
    "    a(u,v) = ∫( ∇(v)⋅∇(u) )dΩ\n",
    "    l(v) = ∫( v*f )dΩ\n",
    "    op = AffineFEOperator(a,l,U,V)\n",
    "    solver = PETScLinearSolver()\n",
    "    uh = solve(solver,op)\n",
    "    writevtk(Ω,\"results_ex2\",cellfields=[\"uh\"=>uh,\"grad_uh\"=>∇(uh)])\n",
    "  end\n",
    "end\n",
    "\n",
    "rank_partition = (2,2)\n",
    "with_mpi() do distribute\n",
    "  main_ex2(rank_partition,distribute)\n",
    "end"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Third example: second example + `GridapP4est.jl` for mesh generation"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "In this example, we define the Cartesian mesh using `GridapP4est.jl` via recursive uniform refinement starting with a single cell. It only involves minor modifications compared to the previous example. First, one has to generate a coarse mesh of the domain. As the domain is a just a simple box in the example, it suffices to use a coarse mesh with a single quadrilateral fitted to the box in order to capture the geometry of the domain with no geometrical error (see how the `coarse_discrete_model` object is generated). In more complex scenarios, one can read an unstructured coarse mesh from disk, generated, e.g., with an unstructured brick mesh generator. Second, when building the fine mesh of the domain (see `UniformlyRefinedForestOfOctreesDiscreteModel` call), one has to specify the number of uniform refinements to be performed on the coarse mesh in order to generate the fine mesh. Finally, when calling `with_mpi(f)`, we do not longer specify a Cartesian partition but just the number of parts."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "using GridapP4est\n",
    "\n",
    "function main_ex3(nparts,distribute)\n",
    "  parts   = distribute(LinearIndices((nparts,)))\n",
    "  options = \"-ksp_type cg -pc_type gamg -ksp_monitor\"\n",
    "  GridapPETSc.with(args=split(options)) do\n",
    "    domain = (0,1,0,1)\n",
    "    coarse_mesh_partition = (1,1)\n",
    "    num_uniform_refinements = 2\n",
    "    coarse_discrete_model = CartesianDiscreteModel(domain,coarse_mesh_partition)\n",
    "    model = UniformlyRefinedForestOfOctreesDiscreteModel(parts,\n",
    "                                                       coarse_discrete_model,\n",
    "                                                       num_uniform_refinements)\n",
    "    order = 2\n",
    "    u((x,y)) = (x+y)^order\n",
    "    f(x) = -Δ(u,x)\n",
    "    reffe = ReferenceFE(lagrangian,Float64,order)\n",
    "    V = TestFESpace(model,reffe,dirichlet_tags=\"boundary\")\n",
    "    U = TrialFESpace(u,V)\n",
    "    Ω = Triangulation(model)\n",
    "    dΩ = Measure(Ω,2*order)\n",
    "    a(u,v) = ∫( ∇(v)⋅∇(u) )dΩ\n",
    "    l(v) = ∫( v*f )dΩ\n",
    "    op = AffineFEOperator(a,l,U,V)\n",
    "    solver = PETScLinearSolver()\n",
    "    uh = solve(solver,op)\n",
    "    writevtk(Ω,\"results_ex3\",cellfields=[\"uh\"=>uh,\"grad_uh\"=>∇(uh)])\n",
    "  end\n",
    "end\n",
    "\n",
    "nparts = 4\n",
    "with_mpi() do distribute\n",
    "  main_ex3(nparts,distribute)\n",
    "end"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Fourth example: second example + `GridapGmsh.jl` for mesh generation"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "In this example, we want to use an unstructured mesh. The mesh is read from disk and partitioned/distributed automatically by `GridapGmsh` inside the call to the `GmshDiscreteModel` constructor."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "using GridapGmsh\n",
    "function main_ex4(nparts,distribute)\n",
    "  parts  = distribute(LinearIndices((nparts,)))\n",
    "  options = \"-ksp_type cg -pc_type gamg -ksp_monitor\"\n",
    "  GridapPETSc.with(args=split(options)) do\n",
    "    model = GmshDiscreteModel(parts,\"../models/demo.msh\")\n",
    "    order = 2\n",
    "    u((x,y)) = (x+y)^order\n",
    "    f(x) = -Δ(u,x)\n",
    "    reffe = ReferenceFE(lagrangian,Float64,order)\n",
    "    V = TestFESpace(model,reffe,dirichlet_tags=[\"boundary1\",\"boundary2\"])\n",
    "    U = TrialFESpace(u,V)\n",
    "    Ω = Triangulation(model)\n",
    "    dΩ = Measure(Ω,2*order)\n",
    "    a(u,v) = ∫( ∇(v)⋅∇(u) )dΩ\n",
    "    l(v) = ∫( v*f )dΩ\n",
    "    op = AffineFEOperator(a,l,U,V)\n",
    "    solver = PETScLinearSolver()\n",
    "    uh = solve(solver,op)\n",
    "    writevtk(Ω,\"results_ex4\",cellfields=[\"uh\"=>uh,\"grad_uh\"=>∇(uh)])\n",
    "  end\n",
    "end\n",
    "\n",
    "nparts = 4\n",
    "with_mpi() do distribute\n",
    "  main_ex4(nparts,distribute)\n",
    "end"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "---\n",
    "\n",
    "*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*"
   ],
   "metadata": {}
  }
 ],
 "nbformat_minor": 3,
 "metadata": {
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.10.8"
  },
  "kernelspec": {
   "name": "julia-1.10",
   "display_name": "Julia 1.10.8",
   "language": "julia"
  }
 },
 "nbformat": 4
}