{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This section covers ways to perform reductions in parallel, task, taskloop, and SIMD regions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### __reduction__ Clause"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following example demonstrates the __reduction__ clause; note that some  reductions can be expressed in the loop in several ways, as shown for the __max__  and __min__ reductions below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: reduction.1\n",
    "* type: C\n",
    "* version: omp_3.1\n",
    "*/\n",
    "#include <math.h>\n",
    "void reduction1(float *x, int *y, int n)\n",
    "{\n",
    "  int i, b, c;\n",
    "  float a, d;\n",
    "  a = 0.0;\n",
    "  b = 0;\n",
    "  c = y[0];\n",
    "  d = x[0];\n",
    "  #pragma omp parallel for private(i) shared(x, y, n) \\\n",
    "                          reduction(+:a) reduction(^:b) \\\n",
    "                          reduction(min:c) reduction(max:d)\n",
    "    for (i=0; i<n; i++) {\n",
    "      a += x[i];\n",
    "      b ^= y[i];\n",
    "      if (c > y[i]) c = y[i];\n",
    "      d = fmaxf(d,x[i]);\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: reduction.1\n",
    "! type: F-free\n",
    "SUBROUTINE REDUCTION1(A, B, C, D, X, Y, N)\n",
    "    REAL :: X(*), A, D\n",
    "    INTEGER :: Y(*), N, B, C\n",
    "    INTEGER :: I\n",
    "    A = 0\n",
    "    B = 0\n",
    "    C = Y(1)\n",
    "    D = X(1)\n",
    "    !$OMP PARALLEL DO PRIVATE(I) SHARED(X, Y, N) REDUCTION(+:A) &\n",
    "    !$OMP& REDUCTION(IEOR:B) REDUCTION(MIN:C)  REDUCTION(MAX:D)\n",
    "      DO I=1,N\n",
    "        A = A + X(I)\n",
    "        B = IEOR(B, Y(I))\n",
    "        C = MIN(C, Y(I))\n",
    "        IF (D < X(I)) D = X(I)\n",
    "      END DO\n",
    "\n",
    "END SUBROUTINE REDUCTION1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A common implementation of the preceding example is to treat it as if it had been  written as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: reduction.2\n",
    "* type: C\n",
    "*/\n",
    "#include <limits.h>\n",
    "#include <math.h>\n",
    "void reduction2(float *x, int *y, int n)\n",
    "{\n",
    "  int i, b, b_p, c, c_p;\n",
    "  float a, a_p, d, d_p;\n",
    "  a = 0.0f;\n",
    "  b = 0;\n",
    "  c = y[0];\n",
    "  d = x[0];\n",
    "  #pragma omp parallel shared(a, b, c, d, x, y, n) \\\n",
    "                          private(a_p, b_p, c_p, d_p)\n",
    "  {\n",
    "    a_p = 0.0f;\n",
    "    b_p = 0;\n",
    "    c_p = INT_MAX;\n",
    "    d_p = -HUGE_VALF;\n",
    "    #pragma omp for private(i)\n",
    "    for (i=0; i<n; i++) {\n",
    "      a_p += x[i];\n",
    "      b_p ^= y[i];\n",
    "      if (c_p > y[i]) c_p = y[i];\n",
    "      d_p = fmaxf(d_p,x[i]);\n",
    "    }\n",
    "    #pragma omp critical\n",
    "    {\n",
    "      a += a_p;\n",
    "      b ^= b_p;\n",
    "      if( c > c_p ) c = c_p;\n",
    "      d = fmaxf(d,d_p);\n",
    "    }\n",
    "  }\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: reduction.2\n",
    "! type: F-free\n",
    "  SUBROUTINE REDUCTION2(A, B, C, D, X, Y, N)\n",
    "    REAL :: X(*), A, D\n",
    "    INTEGER :: Y(*), N, B, C\n",
    "    REAL :: A_P, D_P\n",
    "    INTEGER :: I, B_P, C_P\n",
    "    A = 0\n",
    "    B = 0\n",
    "    C = Y(1)\n",
    "    D = X(1)\n",
    "    !$OMP PARALLEL SHARED(X, Y, A, B, C, D, N) &\n",
    "    !$OMP&         PRIVATE(A_P, B_P, C_P, D_P)\n",
    "      A_P = 0.0\n",
    "      B_P = 0\n",
    "      C_P = HUGE(C_P)\n",
    "      D_P = -HUGE(D_P)\n",
    "      !$OMP DO PRIVATE(I)\n",
    "      DO I=1,N\n",
    "        A_P = A_P + X(I)\n",
    "        B_P = IEOR(B_P, Y(I))\n",
    "        C_P = MIN(C_P, Y(I))\n",
    "        IF (D_P < X(I)) D_P = X(I)\n",
    "      END DO\n",
    "      !$OMP CRITICAL\n",
    "        A = A + A_P\n",
    "        B = IEOR(B, B_P)\n",
    "        C = MIN(C, C_P)\n",
    "        D = MAX(D, D_P)\n",
    "      !$OMP END CRITICAL\n",
    "    !$OMP END PARALLEL\n",
    "  END SUBROUTINE REDUCTION2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following program is non-conforming because the reduction is on the  **intrinsic procedure name** __MAX__ but that name has been redefined to be the variable  named __MAX__."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: reduction.3\n",
    "! type: F-free\n",
    " PROGRAM REDUCTION_WRONG\n",
    " MAX = HUGE(0)\n",
    " M = 0\n",
    "\n",
    " !$OMP PARALLEL DO REDUCTION(MAX: M)\n",
    "! MAX is no longer the intrinsic so this is non-conforming\n",
    " DO I = 1, 100\n",
    "    CALL SUB(M,I)\n",
    " END DO\n",
    "\n",
    " END PROGRAM REDUCTION_WRONG\n",
    "\n",
    " SUBROUTINE SUB(M,I)\n",
    "    M = MAX(M,I)\n",
    " END SUBROUTINE SUB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following conforming program performs the reduction using the  **intrinsic procedure name** __MAX__ even though the intrinsic __MAX__ has been renamed  to __REN__."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: reduction.4\n",
    "! type: F-free\n",
    "MODULE M\n",
    "   INTRINSIC MAX\n",
    "END MODULE M\n",
    "\n",
    "PROGRAM REDUCTION3\n",
    "   USE M, REN => MAX\n",
    "   N = 0\n",
    "!$OMP PARALLEL DO REDUCTION(REN: N)     ! still does MAX\n",
    "   DO I = 1, 100\n",
    "      N = MAX(N,I)\n",
    "   END DO\n",
    "END PROGRAM REDUCTION3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following conforming program performs the reduction using   _intrinsic procedure name_  __MAX__ even though the intrinsic __MAX__ has been renamed  to __MIN__."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: reduction.5\n",
    "! type: F-free\n",
    "MODULE MOD\n",
    "   INTRINSIC MAX, MIN\n",
    "END MODULE MOD\n",
    "\n",
    "PROGRAM REDUCTION4\n",
    "   USE MOD, MIN=>MAX, MAX=>MIN\n",
    "   REAL :: R\n",
    "   R = -HUGE(0.0)\n",
    "\n",
    "!$OMP PARALLEL DO REDUCTION(MIN: R)     ! still does MAX\n",
    "   DO I = 1, 1000\n",
    "      R = MIN(R, SIN(REAL(I)))\n",
    "   END DO\n",
    "   PRINT *, R\n",
    "END PROGRAM REDUCTION4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following example is non-conforming because the initialization (__a =  0__) of the original list item __a__ is not synchronized with the update of  __a__ as a result of the reduction computation in the __for__ loop. Therefore,  the example may print an incorrect value for __a__."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To avoid this problem, the initialization of the original list item __a__  should complete before any update of __a__ as a result of the __reduction__  clause. This can be achieved by adding an explicit barrier after the assignment  __a = 0__, or by enclosing the assignment __a = 0__ in a __single__  directive (which has an implied barrier), or by initializing __a__ before  the start of the __parallel__ region."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: reduction.6\n",
    "* type: C\n",
    "* version:    omp_5.1\n",
    "*/\n",
    "#include <stdio.h>\n",
    "\n",
    "int main (void)\n",
    "{\n",
    "  int a, i;\n",
    "\n",
    "  #pragma omp parallel shared(a) private(i)\n",
    "  {\n",
    "    #pragma omp masked\n",
    "    a = 0;\n",
    "\n",
    "    // To avoid race conditions, add a barrier here.\n",
    "\n",
    "    #pragma omp for reduction(+:a)\n",
    "    for (i = 0; i < 10; i++) {\n",
    "        a += i;\n",
    "    }\n",
    "\n",
    "    #pragma omp single\n",
    "    printf (\"Sum is %d\\n\", a);\n",
    "  }\n",
    "  return 0;\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: reduction.6\n",
    "! type: F-fixed\n",
    "! version:    omp_5.1\n",
    "      INTEGER A, I\n",
    "\n",
    "!$OMP PARALLEL SHARED(A) PRIVATE(I)\n",
    "\n",
    "!$OMP MASKED\n",
    "      A = 0\n",
    "!$OMP END MASKED\n",
    "\n",
    "      ! To avoid race conditions, add a barrier here.\n",
    "\n",
    "!$OMP DO REDUCTION(+:A)\n",
    "      DO I= 0, 9\n",
    "         A = A + I\n",
    "      END DO\n",
    "\n",
    "!$OMP SINGLE\n",
    "      PRINT *, \"Sum is \", A\n",
    "!$OMP END SINGLE\n",
    "\n",
    "!$OMP END PARALLEL\n",
    "\n",
    "      END"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following example demonstrates the reduction of array  _a_ .  In C/C++ this is illustrated by the explicit use of an array section  _a[0:N]_  in the __reduction__ clause.  The corresponding Fortran example uses array syntax supported in the base language.  As of the OpenMP 4.5 specification the explicit use of array section in the __reduction__ clause in Fortran is not permitted.  But this oversight has been fixed in the OpenMP 5.0 specification."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: reduction.7\n",
    "* type: C\n",
    "* version: omp_4.5\n",
    "*/\n",
    "#include <stdio.h>\n",
    "\n",
    "#define N 100\n",
    "void init(int n, float (*b)[N]);\n",
    "\n",
    "int main(){\n",
    "\n",
    "  int i,j;\n",
    "  float a[N], b[N][N];\n",
    "\n",
    "  init(N,b);\n",
    "\n",
    "  for(i=0; i<N; i++) a[i]=0.0e0;\n",
    "\n",
    "  #pragma omp parallel for reduction(+:a[0:N]) private(j)\n",
    "  for(i=0; i<N; i++){\n",
    "    for(j=0; j<N; j++){\n",
    "       a[j] +=  b[i][j];\n",
    "    }\n",
    "  }\n",
    "  printf(\" a[0] a[N-1]: %f %f\\n\", a[0], a[N-1]);\n",
    "\n",
    "  return 0;\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: reduction.7\n",
    "! type: F-free\n",
    "program array_red\n",
    "\n",
    "  integer,parameter :: n=100\n",
    "  integer           :: j\n",
    "  real              :: a(n), b(n,n)\n",
    "\n",
    "  call init(n,b)\n",
    "\n",
    "  a(:) = 0.0e0\n",
    "\n",
    "  !$omp parallel do reduction(+:a)\n",
    "  do j = 1, n\n",
    "     a(:) = a(:) + b(:,j)\n",
    "  end do\n",
    "\n",
    "  print*, \" a(1) a(n): \", a(1), a(n)\n",
    "\n",
    "end program"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Task Reduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In OpenMP 5.0 the __task_reduction__ clause was created for the __taskgroup__ construct,  to allow reductions among explicit tasks that have an __in_reduction__ clause."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the  _task_reduction.1_  example below a reduction is performed as the algorithm traverses a linked list. The reduction statement is assigned to be an explicit task using a __task__ construct and is specified to be a reduction participant with  the __in_reduction__ clause. A __taskgroup__ construct encloses the tasks participating in the reduction, and specifies, with the __task_reduction__ clause, that the taskgroup has tasks participating in a reduction.  After the __taskgroup__ region the original variable will contain  the final value of the reduction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: The  _res_  variable is private in the  _linked_list_sum_  routine and is not required to be shared (as in the case of a __parallel__ construct reduction)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name:       task_reduction.1\n",
    "* type:       C\n",
    "*/\n",
    "\n",
    "#include<stdlib.h>\n",
    "#include<stdio.h>\n",
    "#define N 10\n",
    "\n",
    "typedef struct node_tag {\n",
    "    int val;\n",
    "    struct node_tag *next;\n",
    "} node_t;\n",
    "\n",
    "int linked_list_sum(node_t *p)\n",
    "{\n",
    "    int res = 0;\n",
    "\n",
    "    #pragma omp taskgroup task_reduction(+: res)\n",
    "    {\n",
    "        node_t* aux = p;\n",
    "        while(aux != 0)\n",
    "        {\n",
    "            #pragma omp task in_reduction(+: res)\n",
    "            res += aux->val;\n",
    "\n",
    "            aux = aux->next;\n",
    "        }\n",
    "    }\n",
    "    return res;\n",
    "}\n",
    "\n",
    "\n",
    "int main(int argc, char *argv[]) {\n",
    "    int i;\n",
    "//                           Create the root node.\n",
    "    node_t* root = (node_t*) malloc(sizeof(node_t));\n",
    "    root->val = 1;\n",
    "\n",
    "    node_t* aux = root;\n",
    "\n",
    "//                           Create N-1 more nodes.\n",
    "    for(i=2;i<=N;++i){\n",
    "        aux->next = (node_t*) malloc(sizeof(node_t));\n",
    "        aux = aux->next;\n",
    "        aux->val = i;\n",
    "    }\n",
    "\n",
    "    aux->next = 0;\n",
    "\n",
    "    #pragma omp parallel\n",
    "    #pragma omp single\n",
    "    {\n",
    "        int result = linked_list_sum(root);\n",
    "        printf( \"Calculated: %d  Analytic:%d\\n\", result, (N*(N+1)/2) );\n",
    "    }\n",
    "\n",
    "    return 0;\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name:       task_reduction.1\n",
    "! type:       F-free\n",
    "\n",
    "module m\n",
    "    type node_t\n",
    "        integer :: val\n",
    "        type(node_t), pointer :: next\n",
    "    end type\n",
    "end module m\n",
    "\n",
    "function linked_list_sum(p) result(res)\n",
    "    use m\n",
    "    implicit none\n",
    "    type(node_t), pointer :: p\n",
    "    type(node_t), pointer :: aux\n",
    "    integer :: res\n",
    "\n",
    "    res = 0\n",
    "\n",
    "    !$omp taskgroup task_reduction(+: res)\n",
    "        aux => p\n",
    "        do while (associated(aux))\n",
    "            !$omp task in_reduction(+: res)\n",
    "                res = res + aux%val\n",
    "            !$omp end task\n",
    "            aux => aux%next\n",
    "        end do\n",
    "    !$omp end taskgroup\n",
    "end function linked_list_sum\n",
    "\n",
    "\n",
    "program main\n",
    "    use m\n",
    "    implicit none\n",
    "    type(node_t), pointer :: root, aux\n",
    "    integer :: res, i\n",
    "    integer, parameter :: N=10\n",
    "\n",
    "    interface\n",
    "        function linked_list_sum(p) result(res)\n",
    "            use m\n",
    "            implicit none\n",
    "            type(node_t), pointer :: p\n",
    "            integer :: res\n",
    "        end function\n",
    "    end interface\n",
    "!                       Create the root node.\n",
    "    allocate(root)\n",
    "    root%val = 1\n",
    "    aux => root\n",
    "\n",
    "!                       Create N-1 more nodes.\n",
    "    do i = 2,N\n",
    "        allocate(aux%next)\n",
    "        aux => aux%next\n",
    "        aux%val = i\n",
    "    end do\n",
    "\n",
    "    aux%next => null()\n",
    "\n",
    "    !$omp parallel\n",
    "    !$omp single\n",
    "        res = linked_list_sum(root)\n",
    "        print *, \"Calculated:\", res, \" Analytic:\", (N*(N+1))/2\n",
    "    !$omp end single\n",
    "    !$omp end parallel\n",
    "\n",
    "end program main"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In OpenMP 5.0 the __task__  _reduction-modifier_  for the __reduction__ clause was introduced to provide a means of performing reductions among implicit and explicit tasks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The __reduction__ clause of a __parallel__ or worksharing construct may specify the __task__  _reduction-modifier_  to include explicit task reductions within their region, provided the reduction operators ( _reduction-identifiers_ ) and variables ( _list items_ ) of the participating tasks match those of the implicit tasks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are 2 reduction use cases (identified by USE CASE #) in the  _task_reduction.2_  example below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In USE CASE 1 a __task__ modifier in the __reduction__ clause  of the __parallel__ construct is used to include the reductions of any  participating tasks, those with an __in_reduction__ clause and matching   _reduction-identifiers_  (__+__) and list items (__x__)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note, a __taskgroup__ construct (with a __task_reduction__ clause) in not necessary to scope the explicit task reduction (as seen in the example above).  Hence, even without the implicit task reduction statement (without the C __x++__   and Fortran __x=x+1__ statements), the __task__  _reduction-modifier_   in a __reduction__ clause of the __parallel__ construct can be used to avoid having to create a __taskgroup__ construct  (and its __task_reduction__ clause) around the task generating structure."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In USE CASE 2 tasks participating in the reduction are within a worksharing region (a parallel worksharing-loop construct). Here, too, no __taskgroup__ is required, and the  _reduction-identifier_  (__+__) and list item (variable __x__) match as required."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name:       task_reduction.2\n",
    "* type:       C\n",
    "* version: omp_5.0\n",
    "*/\n",
    "#include <stdio.h>\n",
    "int main(void){\n",
    "   int N=100, M=10;\n",
    "   int i, x;\n",
    "\n",
    "// USE CASE 1  explicit-task reduction + parallel reduction clause\n",
    "   x=0;\n",
    "   #pragma omp parallel num_threads(M) reduction(task,+:x)\n",
    "   {\n",
    "\n",
    "     x++;                // implicit task reduction statement\n",
    "\n",
    "     #pragma omp single\n",
    "     for(i=0;i<N;i++)\n",
    "       #pragma omp task in_reduction(+:x)\n",
    "       x++;\n",
    "\n",
    "   }\n",
    "   printf(\"x=%d  =M+N\\n\",x);  // x= 110  =M+N\n",
    "\n",
    "\n",
    "// USE CASE 2  task reduction +  worksharing reduction clause\n",
    "   x=0;\n",
    "   #pragma omp parallel for num_threads(M) reduction(task,+:x)\n",
    "   for(i=0; i< N; i++){\n",
    "\n",
    "      x++;\n",
    "\n",
    "      if( i%2 == 0){\n",
    "       #pragma omp task in_reduction(+:x)\n",
    "       x--;\n",
    "      }\n",
    "   }\n",
    "   printf(\"x=%d  =N-N/2\\n\",x);  // x= 50  =N-N/2\n",
    "\n",
    "   return 0;\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name:       task_reduction.2\n",
    "! type:       F-free\n",
    "! version:    omp_5.0\n",
    "\n",
    "program task_modifier\n",
    "\n",
    "   integer :: N=100, M=10\n",
    "   integer :: i, x\n",
    "\n",
    "! USE CASE 1  explicit-task reduction + parallel reduction clause\n",
    "   x=0\n",
    "   !$omp parallel num_threads(M) reduction(task,+:x)\n",
    "\n",
    "     x=x+1                   !! implicit task reduction statement\n",
    "\n",
    "     !$omp single\n",
    "       do i = 1,N\n",
    "         !$omp task in_reduction(+:x)\n",
    "           x=x+1\n",
    "         !$omp end task\n",
    "       end do\n",
    "     !$omp end single\n",
    "\n",
    "   !$omp end parallel\n",
    "   write(*,'(\"x=\",I0,\" =M+N\")') x   ! x= 110 =M+N\n",
    "\n",
    "\n",
    "! USE CASE 2  task reduction +  worksharing reduction clause\n",
    "   x=0\n",
    "   !$omp parallel do num_threads(M) reduction(task,+:x)\n",
    "     do i = 1,N\n",
    "\n",
    "        x=x+1\n",
    "\n",
    "        if( mod(i,2) == 0) then\n",
    "           !$omp task in_reduction(+:x)\n",
    "             x=x-1\n",
    "           !$omp end task\n",
    "        endif\n",
    "\n",
    "     end do\n",
    "   write(*,'(\"x=\",I0,\"  =N-N/2\")') x   ! x= 50 =N-N/2\n",
    "\n",
    "end program"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reduction on Combined Target Constructs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When a __reduction__ clause appears on a combined construct that combines  a __target__ construct with another construct, there is an implicit map  of the list items with a __tofrom__ map type for the __target__ construct.  Otherwise, the list items (if they are scalar variables) would be  treated as firstprivate by default in the __target__ construct, which  is unlikely to provide the intended behavior since the result of the reduction that is in the firstprivate variable would be discarded  at the end of the __target__ region."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the following example, the use of the __reduction__ clause on __sum1__ or __sum2__ should, by default, result in an implicit __tofrom__ map for that variable. So long as neither __sum1__ nor __sum2__ were already present on the device, the mapping behavior ensures the value for __sum1__ computed in the first __target__ construct is used in the second __target__ construct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: target_reduction.1\n",
    "* type: C\n",
    "* version: omp_5.0\n",
    "*/\n",
    "#include <stdio.h>\n",
    "int f(int);\n",
    "int g(int);\n",
    "int main()\n",
    "{\n",
    "   int sum1=0, sum2=0;\n",
    "   int i;\n",
    "   const int n = 100;\n",
    "\n",
    "   #pragma omp target teams distribute reduction(+:sum1)\n",
    "   for (int i = 0; i < n; i++) {\n",
    "      sum1 += f(i);\n",
    "   }\n",
    "\n",
    "   #pragma omp target teams distribute reduction(+:sum2)\n",
    "   for (int i = 0; i < n; i++) {\n",
    "      sum2 += g(i) * sum1;\n",
    "   }\n",
    "\n",
    "   printf(  \"sum1 = %d, sum2 = %d\\n\", sum1, sum2);\n",
    "   //OUTPUT: sum1 = 9900, sum2 = 147015000\n",
    "   return 0;\n",
    "}\n",
    "\n",
    "int f(int res){ return res*2; }\n",
    "int g(int res){ return res*3; }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: target_reduction.1\n",
    "! type: F-free\n",
    "! version: omp_5.0\n",
    "program target_reduction_ex1\n",
    "   interface\n",
    "      function f(res)\n",
    "             integer :: f, res\n",
    "          end function\n",
    "      function g(res)\n",
    "             integer :: g, res\n",
    "          end function\n",
    "   end interface\n",
    "   integer :: sum1, sum2, i\n",
    "   integer, parameter :: n = 100\n",
    "   sum1 = 0\n",
    "   sum2 = 0\n",
    "   !$omp target teams distribute reduction(+:sum1)\n",
    "       do i=1,n\n",
    "          sum1 = sum1 + f(i)\n",
    "       end do\n",
    "   !$omp target teams distribute reduction(+:sum2)\n",
    "       do i=1,n\n",
    "          sum2 = sum2 + g(i)*sum1\n",
    "       end do\n",
    "   print *, \"sum1 = \", sum1, \", sum2 = \", sum2\n",
    "   !!OUTPUT: sum1 =     10100 , sum2 = 153015000\n",
    "end program\n",
    "\n",
    "\n",
    "integer function f(res)\n",
    "   integer :: res\n",
    "   f = res*2\n",
    "end function\n",
    "integer function g(res)\n",
    "   integer :: res\n",
    "   g = res*3\n",
    "end function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In next example,  the variables __sum1__ and __sum2__ remain on the device for the duration of the __target__ __data__ region so that it is their device copies that are updated by the reductions. Note the significance of mapping __sum1__ on the second __target__ construct; otherwise, it would be treated by default as firstprivate and the result computed for __sum1__ in the prior __target__ region may not be used. Alternatively, a __target__ __update__ construct could be used between the two __target__ constructs to update the host version of __sum1__ with the value that is in the corresponding device version after the completion of the first construct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: target_reduction.2\n",
    "* type: C\n",
    "* version: omp_5.0\n",
    "*/\n",
    "#include <stdio.h>\n",
    "int f(int);\n",
    "int g(int);\n",
    "int main()\n",
    "{\n",
    "   int sum1=0, sum2=0;\n",
    "   int i;\n",
    "   const int n = 100;\n",
    "\n",
    "   #pragma omp target data map(sum1,sum2)\n",
    "   {\n",
    "      #pragma omp target teams distribute reduction(+:sum1)\n",
    "      for (int i = 0; i < n; i++) {\n",
    "         sum1 += f(i);\n",
    "      }\n",
    "\n",
    "      #pragma omp target teams distribute map(sum1) reduction(+:sum2)\n",
    "      for (int i = 0; i < n; i++) {\n",
    "         sum2 += g(i) * sum1;\n",
    "      }\n",
    "   }\n",
    "   printf(  \"sum1 = %d, sum2 = %d\\n\", sum1, sum2);\n",
    "   //OUTPUT: sum1 = 9900, sum2 = 147015000\n",
    "   return 0;\n",
    "}\n",
    "\n",
    "int f(int res){ return res*2; }\n",
    "int g(int res){ return res*3; }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: target_reduction.2\n",
    "! type: F-free\n",
    "! version: omp_5.0\n",
    "\n",
    "program target_reduction_ex2\n",
    "   interface\n",
    "      function f(res)\n",
    "             integer :: f, res\n",
    "          end function\n",
    "      function g(res)\n",
    "             integer :: g, res\n",
    "          end function\n",
    "   end interface\n",
    "   integer :: sum1, sum2, i\n",
    "   integer, parameter :: n = 100\n",
    "   sum1 = 0\n",
    "   sum2 = 0\n",
    "   !$omp target data map(sum1, sum2)\n",
    "       !$omp target teams distribute reduction(+:sum1)\n",
    "           do i=1,n\n",
    "              sum1 = sum1 + f(i)\n",
    "           end do\n",
    "       !$omp target teams distribute map(sum1) reduction(+:sum2)\n",
    "           do i=1,n\n",
    "              sum2 = sum2 + g(i)*sum1\n",
    "           end do\n",
    "   !$omp end target data\n",
    "   print *, \"sum1 = \", sum1, \", sum2 = \", sum2\n",
    "   !!OUTPUT: sum1 =     10100 , sum2 = 153015000\n",
    "end program\n",
    "\n",
    "\n",
    "integer function f(res)\n",
    "   integer :: res\n",
    "   f = res*2\n",
    "end function\n",
    "integer function g(res)\n",
    "   integer :: res\n",
    "   g = res*3\n",
    "end function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Task Reduction with Target Constructs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following examples illustrate how task reductions can apply to target tasks that result from a __target__ construct with the __in_reduction__ clause. Here, the __in_reduction__ clause specifies that the target task participates in the task reduction defined in the scope of the enclosing __taskgroup__ construct. Partial results from all tasks participating in the task reduction will be combined (in some order) into the original variable listed in the __task_reduction__ clause before exiting the __taskgroup__ region."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: target_task_reduction.1\n",
    "* type: C\n",
    "* version: omp_5.2\n",
    "*/\n",
    "\n",
    "#include <stdio.h>\n",
    "#pragma omp declare target enter(device_compute)\n",
    "void device_compute(int *);\n",
    "void host_compute(int *);\n",
    "int main()\n",
    "{\n",
    "   int sum = 0;\n",
    "\n",
    "   #pragma omp parallel masked\n",
    "   #pragma omp taskgroup task_reduction(+:sum)\n",
    "   {\n",
    "      #pragma omp target in_reduction(+:sum) nowait\n",
    "          device_compute(&sum);\n",
    "\n",
    "      #pragma omp task in_reduction(+:sum)\n",
    "          host_compute(&sum);\n",
    "   }\n",
    "   printf(  \"sum = %d\\n\", sum);\n",
    "   //OUTPUT: sum = 2\n",
    "   return 0;\n",
    "}\n",
    "\n",
    "void device_compute(int *sum){ *sum = 1; }\n",
    "void   host_compute(int *sum){ *sum = 1; }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: target_task_reduction.1\n",
    "! type: F-free\n",
    "! version: omp_5.2\n",
    "\n",
    "program target_task_reduction_ex1\n",
    "   interface\n",
    "      subroutine device_compute(res)\n",
    "      !$omp declare target enter(device_compute)\n",
    "        integer :: res\n",
    "      end subroutine device_compute\n",
    "      subroutine host_compute(res)\n",
    "        integer :: res\n",
    "      end subroutine host_compute\n",
    "   end interface\n",
    "   integer :: sum\n",
    "   sum = 0\n",
    "   !$omp parallel masked\n",
    "      !$omp taskgroup task_reduction(+:sum)\n",
    "         !$omp target in_reduction(+:sum) nowait\n",
    "            call device_compute(sum)\n",
    "         !$omp end target\n",
    "         !$omp task in_reduction(+:sum)\n",
    "            call host_compute(sum)\n",
    "         !$omp end task\n",
    "      !$omp end taskgroup\n",
    "   !$omp end parallel masked\n",
    "   print *, \"sum = \", sum\n",
    "   !!OUTPUT: sum = 2\n",
    "end program\n",
    "\n",
    "subroutine device_compute(sum)\n",
    "   integer :: sum\n",
    "   sum = 1\n",
    "end subroutine\n",
    "subroutine host_compute(sum)\n",
    "   integer :: sum\n",
    "   sum = 1\n",
    "end subroutine"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the next pair of examples, the task reduction is defined by a __reduction__ clause with the __task__ modifier, rather than a __task_reduction__ clause on a __taskgroup__ construct. Again, the partial results from the participating tasks will be combined in some order into the original reduction variable, __sum__."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: target_task_reduction.2a\n",
    "* type: C\n",
    "* version: omp_5.2\n",
    "*/\n",
    "#include <stdio.h>\n",
    "#pragma omp declare target enter(device_compute)\n",
    "extern void device_compute(int *);\n",
    "extern void host_compute(int *);\n",
    "int main()\n",
    "{\n",
    "   int sum = 0;\n",
    "\n",
    "   #pragma omp parallel sections reduction(task, +:sum)\n",
    "   {\n",
    "      #pragma omp section\n",
    "          {\n",
    "             #pragma omp target in_reduction(+:sum)\n",
    "             device_compute(&sum);\n",
    "          }\n",
    "      #pragma omp section\n",
    "          {\n",
    "             host_compute(&sum);\n",
    "          }\n",
    "   }\n",
    "   printf(  \"sum = %d\\n\", sum);\n",
    "   //OUTPUT: sum = 2\n",
    "   return 0;\n",
    "}\n",
    "\n",
    "void device_compute(int *sum){ *sum = 1; }\n",
    "void   host_compute(int *sum){ *sum = 1; }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: target_task_reduction.2a\n",
    "! type: F-free\n",
    "! version: omp_5.2\n",
    "\n",
    "program target_task_reduction_ex2\n",
    "   interface\n",
    "      subroutine device_compute(res)\n",
    "      !$omp declare target enter(device_compute)\n",
    "        integer :: res\n",
    "      end subroutine device_compute\n",
    "      subroutine host_compute(res)\n",
    "        integer :: res\n",
    "      end subroutine host_compute\n",
    "   end interface\n",
    "   integer :: sum\n",
    "   sum = 0\n",
    "   !$omp parallel sections reduction(task,+:sum)\n",
    "      !$omp section\n",
    "         !$omp target in_reduction(+:sum) nowait\n",
    "           call device_compute(sum)\n",
    "         !$omp end target\n",
    "      !$omp section\n",
    "         call host_compute(sum)\n",
    "   !$omp end parallel sections\n",
    "   print *, \"sum = \", sum\n",
    "   !!OUTPUT: sum = 2\n",
    "end program\n",
    "\n",
    "subroutine device_compute(sum)\n",
    "   integer :: sum\n",
    "   sum = 1\n",
    "end subroutine\n",
    "subroutine host_compute(sum)\n",
    "   integer :: sum\n",
    "   sum = 1\n",
    "end subroutine"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, the __task__ modifier is again used to define a task reduction over participating tasks. This time, the participating tasks are a target task resulting from a __target__ construct with the __in_reduction__ clause, and the implicit task (executing on the primary thread) that calls __host_compute__. As before, the partial results from these participating tasks are combined in some order into the original reduction variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: target_task_reduction.2b\n",
    "* type: C\n",
    "* version: omp_5.2\n",
    "*/\n",
    "#include <stdio.h>\n",
    "#pragma omp declare target enter(device_compute)\n",
    "extern void device_compute(int *);\n",
    "extern void host_compute(int *);\n",
    "int main()\n",
    "{\n",
    "   int sum = 0;\n",
    "\n",
    "   #pragma omp parallel masked reduction(task, +:sum)\n",
    "   {\n",
    "       #pragma omp target in_reduction(+:sum) nowait\n",
    "       device_compute(&sum);\n",
    "\n",
    "       host_compute(&sum);\n",
    "   }\n",
    "   printf(  \"sum = %d\\n\", sum);\n",
    "   //OUTPUT: sum = 2\n",
    "   return 0;\n",
    "}\n",
    "\n",
    "void device_compute(int *sum){ *sum = 1; }\n",
    "void   host_compute(int *sum){ *sum = 1; }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: target_task_reduction.2b\n",
    "! type: F-free\n",
    "! version: omp_5.2\n",
    "\n",
    "program target_task_reduction_ex2b\n",
    "   interface\n",
    "      subroutine device_compute(res)\n",
    "      !$omp declare target enter(device_compute)\n",
    "        integer :: res\n",
    "      end subroutine device_compute\n",
    "      subroutine host_compute(res)\n",
    "        integer :: res\n",
    "      end subroutine host_compute\n",
    "   end interface\n",
    "   integer :: sum\n",
    "   sum = 0\n",
    "   !$omp parallel masked reduction(task,+:sum)\n",
    "         !$omp target in_reduction(+:sum) nowait\n",
    "           call device_compute(sum)\n",
    "         !$omp end target\n",
    "         call host_compute(sum)\n",
    "   !$omp end parallel masked\n",
    "   print *, \"sum = \", sum\n",
    "   !!OUTPUT: sum = 2\n",
    "end program\n",
    "\n",
    "\n",
    "subroutine device_compute(sum)\n",
    "   integer :: sum\n",
    "   sum = 1\n",
    "end subroutine\n",
    "subroutine host_compute(sum)\n",
    "   integer :: sum\n",
    "   sum = 1\n",
    "end subroutine"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Taskloop Reduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the OpenMP 5.0 Specification the __taskloop__ construct was extended to include the reductions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following two examples show how to implement a reduction over an array using taskloop reduction in two different ways. In the first example we apply the __reduction__ clause to the __taskloop__ construct. As it was explained above in the task reduction examples, a reduction over tasks is divided in two components: the scope of the reduction, which is defined by a __taskgroup__ region, and the tasks that participate in the reduction. In this example, the __reduction__ clause defines both semantics. First, it specifies that the implicit __taskgroup__ region associated with the __taskloop__ construct is the scope of the reduction, and second, it defines all tasks created by the __taskloop__ construct as participants of the reduction. About the first property, it is important to note that if we add the __nogroup__ clause to the __taskloop__ construct the code will be nonconforming, basically because we have a set of tasks that participate in a reduction that has not been defined."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name:       taskloop_reduction.1\n",
    "* type:       C\n",
    "* version:    omp_5.0\n",
    "*/\n",
    "#include <stdio.h>\n",
    "\n",
    "int array_sum(int n, int *v) {\n",
    "    int i;\n",
    "    int res = 0;\n",
    "\n",
    "    #pragma omp taskloop reduction(+: res)\n",
    "    for(i = 0; i < n; ++i)\n",
    "        res += v[i];\n",
    "\n",
    "    return res;\n",
    "}\n",
    "\n",
    "int main(int argc, char *argv[]) {\n",
    "    int n = 10;\n",
    "    int v[10] = {1,2,3,4,5,6,7,8,9,10};\n",
    "\n",
    "    #pragma omp parallel\n",
    "    #pragma omp single\n",
    "    {\n",
    "        int res = array_sum(n, v);\n",
    "        printf(\"The result is %d\\n\", res);\n",
    "    }\n",
    "    return 0;\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: taskloop_reduction.1\n",
    "! type: F-free\n",
    "! version:    omp_5.0\n",
    "function array_sum(n, v) result(res)\n",
    "    implicit none\n",
    "    integer :: n, v(n), res\n",
    "    integer :: i\n",
    "\n",
    "    res = 0\n",
    "    !$omp taskloop reduction(+: res)\n",
    "    do i=1, n\n",
    "        res = res + v(i)\n",
    "    end do\n",
    "    !$omp end taskloop\n",
    "\n",
    "end function array_sum\n",
    "\n",
    "program main\n",
    "    implicit none\n",
    "    integer :: n, v(10), res\n",
    "    integer :: i\n",
    "\n",
    "    integer, external :: array_sum\n",
    "\n",
    "    n = 10\n",
    "    do i=1, n\n",
    "        v(i) = i\n",
    "    end do\n",
    "\n",
    "    !$omp parallel\n",
    "    !$omp single\n",
    "    res = array_sum(n, v)\n",
    "    print *, \"The result is\", res\n",
    "    !$omp end single\n",
    "    !$omp end parallel\n",
    "end program main"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The second example computes exactly the same value as in the preceding  _taskloop_reduction.1_  code section, but in a very different way. First, in the  _array_sum_  function a __taskgroup__ region is created  that defines the scope of a new reduction using the __task_reduction__ clause. After that, a task and also the tasks generated by a taskloop participate in  that reduction by using the __in_reduction__ clause on the __task__ and __taskloop__ constructs, respectively.  Note that the __nogroup__ clause was added to the __taskloop__ construct. This is allowed because what is expressed with the __in_reduction__ clause is different from what is expressed with the __reduction__ clause. In one case the generated tasks are specified to participate in a previously  declared reduction (__in_reduction__ clause) whereas in the other case creation of a new reduction is specified and also all tasks generated  by the taskloop will participate on it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name:       taskloop_reduction.2\n",
    "* type:       C\n",
    "* version:    omp_5.0\n",
    "*/\n",
    "#include <stdio.h>\n",
    "\n",
    "int array_sum(int n, int *v) {\n",
    "    int i;\n",
    "    int res = 0;\n",
    "\n",
    "    #pragma omp taskgroup task_reduction(+: res)\n",
    "    {\n",
    "        if (n > 0) {\n",
    "            #pragma omp task in_reduction(+: res)\n",
    "            res = res + v[0];\n",
    "\n",
    "            #pragma omp taskloop in_reduction(+: res) nogroup\n",
    "            for(i = 1; i < n; ++i)\n",
    "                res += v[i];\n",
    "        }\n",
    "    }\n",
    "\n",
    "    return res;\n",
    "}\n",
    "\n",
    "int main(int argc, char *argv[]) {\n",
    "    int n = 10;\n",
    "    int v[10] = {1,2,3,4,5,6,7,8,9,10};\n",
    "\n",
    "    #pragma omp parallel\n",
    "    #pragma omp single\n",
    "    {\n",
    "        int res = array_sum(n, v);\n",
    "        printf(\"The result is %d\\n\", res);\n",
    "    }\n",
    "    return 0;\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: taskloop_reduction.2\n",
    "! type: F-free\n",
    "! version:    omp_5.0\n",
    "function array_sum(n, v) result(res)\n",
    "    implicit none\n",
    "    integer :: n, v(n), res\n",
    "    integer :: i\n",
    "\n",
    "    res = 0\n",
    "    !$omp taskgroup task_reduction(+: res)\n",
    "    if (n > 0) then\n",
    "        !$omp task in_reduction(+: res)\n",
    "        res = res + v(1)\n",
    "        !$omp end task\n",
    "\n",
    "        !$omp taskloop in_reduction(+: res) nogroup\n",
    "        do i=2, n\n",
    "            res = res + v(i)\n",
    "        end do\n",
    "        !$omp end taskloop\n",
    "    endif\n",
    "    !$omp end taskgroup\n",
    "\n",
    "end function array_sum\n",
    "\n",
    "program main\n",
    "    implicit none\n",
    "    integer :: n, v(10), res\n",
    "    integer :: i\n",
    "\n",
    "    integer, external :: array_sum\n",
    "\n",
    "    n = 10\n",
    "    do i=1, n\n",
    "        v(i) = i\n",
    "    end do\n",
    "\n",
    "    !$omp parallel\n",
    "    !$omp single\n",
    "    res = array_sum(n, v)\n",
    "    print *, \"The result is\", res\n",
    "    !$omp end single\n",
    "    !$omp end parallel\n",
    "end program main"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the OpenMP 5.0 Specification, __reduction__ clauses for the __taskloop simd__ construct were also added."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The examples below compare reductions for the __taskloop__ and the __taskloop__ __simd__ constructs. These examples illustrate the use of __reduction__ clauses within  \"stand-alone\" __taskloop__ constructs, and the use of __in_reduction__ clauses for tasks of taskloops to participate with other reductions within the scope of a parallel region."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**taskloop reductions:**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the  _taskloop reductions_  section of the example below,   _taskloop 1_  uses the __reduction__ clause  in a __taskloop__ construct for a sum reduction, accumulated in  _asum_ .  The behavior is as though a __taskgroup__ construct encloses the  taskloop region with a __task_reduction__ clause, and each taskloop task has an __in_reduction__ clause with the specifications  of the __reduction__ clause. At the end of the taskloop region  _asum_  contains the result of the reduction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next taskloop,  _taskloop 2_ , illustrates the use of the  __in_reduction__ clause to participate in a previously defined reduction scope of a __parallel__ construct."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The task reductions of  _task 2_  and  _taskloop 2_  are combined across the __taskloop__ construct and the single __task__ construct, as specified in the __reduction(task,__ __+:asum)__ clause of the __parallel__ construct. At the end of the parallel region  _asum_  contains the combined result of all reductions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**taskloop simd reductions:**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Reductions for the __taskloop__ __simd__ construct are shown in the second half of the code. Since each component construct, __taskloop__ and __simd__,  can accept a reduction-type clause, the __taskloop__ __simd__ construct is a composite construct, and the specific application of the reduction clause is defined within the __taskloop__ __simd__ construct section of the OpenMP 5.0 Specification. The code below illustrates use cases for these reductions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the  _taskloop simd reduction_  section of the example below,  _taskloop simd 3_  uses the __reduction__ clause  in a __taskloop__ __simd__ construct for a sum reduction within a loop. For this case a __reduction__ clause is used, as one would use  for a __simd__ construct. The SIMD reductions of each task are combined, and the results of these tasks are further  combined just as in the __taskloop__ construct with the __reduction__ clause for  _taskloop 1_ . At the end of the taskloop region  _asum_  contains the combined result of all reductions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If a __taskloop__ __simd__ construct is to participate in a previously defined  reduction scope, the reduction participation should be specified with a __in_reduction__ clause, as shown in the __parallel__ region enclosing  _task 4_  and  _taskloop simd 4_  code sections."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here the __taskloop__ __simd__ construct's  __in_reduction__ clause specifies participation of the construct's tasks as  a task reduction within the scope of the parallel region.   That is, the results of each task of the __taskloop__ construct component  contribute to the reduction in a broader level, just as in  _parallel reduction a_  code section above. Also, each __simd__-component construct occurs as if it has a __reduction__ clause, and the SIMD results of each task are combined as though to form a single result for each task (that participates in the __in_reduction__ clause). At the end of the parallel region  _asum_  contains the combined result of all reductions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: taskloop_simd_reduction.1\n",
    "* type: C\n",
    "* version: omp_5.1\n",
    "*/\n",
    "#include <stdio.h>\n",
    "#define N 100\n",
    "\n",
    "int main(){\n",
    "   int i, a[N], asum=0;\n",
    "\n",
    "   for(i=0;i<N;i++) a[i]=i;\n",
    "\n",
    "   // taskloop reductions\n",
    "\n",
    "   #pragma omp parallel masked\n",
    "   #pragma omp taskloop reduction(+:asum) // taskloop 1\n",
    "      for(i=0;i<N;i++){ asum += a[i]; }\n",
    "\n",
    "\n",
    "   #pragma omp parallel reduction(task, +:asum) // parallel reduction a\n",
    "   {\n",
    "      #pragma omp masked\n",
    "      #pragma omp task            in_reduction(+:asum) // task 2\n",
    "         for(i=0;i<N;i++){ asum += a[i]; }\n",
    "\n",
    "      #pragma omp masked taskloop in_reduction(+:asum) // taskloop 2\n",
    "         for(i=0;i<N;i++){ asum += a[i]; }\n",
    "   }\n",
    "\n",
    "   // taskloop simd reductions\n",
    "\n",
    "   #pragma omp parallel masked\n",
    "   #pragma omp taskloop simd reduction(+:asum) // taskloop simd 3\n",
    "     for(i=0;i<N;i++){ asum += a[i]; }\n",
    "\n",
    "\n",
    "   #pragma omp parallel reduction(task, +:asum) // parallel reduction b\n",
    "   {\n",
    "      #pragma omp masked\n",
    "      #pragma omp task                 in_reduction(+:asum) // task 4\n",
    "         for(i=0;i<N;i++){ asum += a[i]; }\n",
    "\n",
    "      #pragma omp masked taskloop simd in_reduction(+:asum) // taskloop\n",
    "         for(i=0;i<N;i++){ asum += a[i]; }                  // simd 4\n",
    "\n",
    "  }\n",
    "\n",
    "  printf(\"asum=%d \\n\",asum); // output: asum=29700\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: taskloop_simd_reduction.1\n",
    "! type: F-free\n",
    "! version: omp_5.1\n",
    "program main\n",
    "\n",
    "  use omp_lib\n",
    "  integer, parameter ::  N=100\n",
    "  integer            :: i, a(N), asum=0\n",
    "\n",
    "  a = [( i, i=1,N )]    !! initialize\n",
    "\n",
    "!! taskloop reductions\n",
    "\n",
    "  !$omp parallel masked\n",
    "  !$omp taskloop reduction(+:asum)                  !! taskloop 1\n",
    "    do i=1,N;  asum = asum + a(i);  enddo\n",
    "  !$omp end taskloop\n",
    "  !$omp end parallel masked\n",
    "\n",
    "\n",
    "  !$omp parallel reduction(task, +:asum)            !! parallel reduction a\n",
    "\n",
    "     !$omp masked\n",
    "     !$omp task            in_reduction(+:asum)     !! task 2\n",
    "       do i=1,N;  asum = asum + a(i);  enddo\n",
    "     !$omp end task\n",
    "     !$omp end masked\n",
    "\n",
    "     !$omp masked taskloop in_reduction(+:asum)     !! taskloop 2\n",
    "       do i=1,N;  asum = asum + a(i);  enddo\n",
    "     !$omp end masked taskloop\n",
    "\n",
    "  !$omp end parallel\n",
    "\n",
    "!! taskloop simd reductions\n",
    "\n",
    "  !$omp parallel masked\n",
    "  !$omp taskloop simd reduction(+:asum)             !! taskloop simd 3\n",
    "    do i=1,N;  asum = asum + a(i);  enddo\n",
    "  !$omp end taskloop simd\n",
    "  !$omp end parallel masked\n",
    "\n",
    "\n",
    "  !$omp parallel reduction(task, +:asum)            !! parallel reduction b\n",
    "\n",
    "    !$omp masked\n",
    "    !$omp task                 in_reduction(+:asum) !! task 4\n",
    "       do i=1,N;  asum = asum + a(i);  enddo\n",
    "    !$omp end task\n",
    "    !$omp end masked\n",
    "\n",
    "    !$omp masked taskloop simd in_reduction(+:asum) !! taskloop simd 4\n",
    "       do i=1,N;  asum = asum + a(i);  enddo\n",
    "    !$omp end masked taskloop simd\n",
    "\n",
    "  !$omp end parallel\n",
    "\n",
    "  print*,\"asum=\",asum   !! output: asum=30300\n",
    "\n",
    "end program"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reduction with the __scope__ Construct"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following example illustrates the use of the __scope__ construct  to perform a reduction in a __parallel__ region. The case is useful for  producing a reduction and accessing reduction variables inside a __parallel__ region  without using a worksharing-loop construct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: scope_reduction.1\n",
    "* type: C++\n",
    "* version: omp_5.1\n",
    "*/\n",
    "#include <stdio.h>\n",
    "void do_work(int n, float a[], float &s)\n",
    "{\n",
    "   float loc_s = 0.0f;        // local sum\n",
    "   static int nthrs;\n",
    "   #pragma omp for\n",
    "      for (int i = 0; i < n; i++)\n",
    "         loc_s += a[i];\n",
    "   #pragma omp single\n",
    "   {\n",
    "      s = 0.0f;               // total sum\n",
    "      nthrs = 0;\n",
    "   }\n",
    "   #pragma omp scope reduction(+:s,nthrs)\n",
    "   {\n",
    "      s += loc_s;\n",
    "      nthrs++;\n",
    "   }\n",
    "   #pragma omp masked\n",
    "      printf(\"total sum = %f, nthrs = %d\\n\", s, nthrs);\n",
    "}\n",
    "\n",
    "float work(int n, float a[])\n",
    "{\n",
    "   float s;\n",
    "   #pragma omp parallel\n",
    "   {\n",
    "      do_work(n, a, s);\n",
    "   }\n",
    "   return s;\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: scope_reduction.1\n",
    "! type: F-free\n",
    "! version: omp_5.1\n",
    "subroutine do_work(n, a, s)\n",
    "   implicit none\n",
    "   integer n, i\n",
    "   real a(*), s, loc_s\n",
    "   integer, save :: nthrs\n",
    "\n",
    "   loc_s = 0.0                ! local sum\n",
    "   !$omp do\n",
    "      do i = 1, n\n",
    "         loc_s = loc_s + a(i)\n",
    "      end do\n",
    "   !$omp single\n",
    "      s = 0.0                 ! total sum\n",
    "      nthrs = 0\n",
    "   !$omp end single\n",
    "   !$omp scope reduction(+:s,nthrs)\n",
    "      s = s + loc_s\n",
    "      nthrs = nthrs + 1\n",
    "   !$omp end scope\n",
    "   !$omp masked\n",
    "      print *, \"total sum = \", s, \", nthrs = \", nthrs\n",
    "   !$omp end masked\n",
    "end subroutine\n",
    "\n",
    "function work(n, a) result(s)\n",
    "   implicit none\n",
    "   integer n\n",
    "   real a(*), s\n",
    "\n",
    "   !$omp parallel\n",
    "      call do_work(n, a, s)\n",
    "   !$omp end parallel\n",
    "end function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### User-Defined Reduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The __declare__ __reduction__ directive can be used to specify  user-defined reductions (UDR) for user data types."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the following example, __declare__ __reduction__ directives are used to define  _min_  and  _max_  operations for the  _point_  data structure for computing the rectangle that encloses a set of 2-D points."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each __declare__ __reduction__ directive defines new reduction identifiers,  _min_  and  _max_ , to be used in a __reduction__ clause. The next item in the declaration list is the data type ( _struct_   _point_ ) used in the reduction, followed by the combiner, here the functions  _minproc_  and  _maxproc_  perform the min and max operations, respectively, on the user data (of type  _struct_   _point_ ). In the function argument list are two special OpenMP variable identifiers, __omp_in__ and __omp_out__, that denote the two values to be combined in the \"real\" function; the __omp_out__ identifier indicates which one is to hold the result."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The initializer of the __declare__ __reduction__ directive specifies the initial value for the private variable of each implicit task. The __omp_priv__ identifier is used to denote the private variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: udr.1\n",
    "* type: C\n",
    "* version: omp_4.0\n",
    "*/\n",
    "#include <stdio.h>\n",
    "#include <limits.h>\n",
    "\n",
    "struct point {\n",
    "  int x;\n",
    "  int y;\n",
    "};\n",
    "\n",
    "void minproc ( struct point *out, struct point *in )\n",
    "{\n",
    "  if ( in->x < out->x ) out->x = in->x;\n",
    "  if ( in->y < out->y ) out->y = in->y;\n",
    "}\n",
    "\n",
    "void maxproc ( struct point *out, struct point *in )\n",
    "{\n",
    "  if ( in->x > out->x ) out->x = in->x;\n",
    "  if ( in->y > out->y ) out->y = in->y;\n",
    "}\n",
    "\n",
    "#pragma omp declare reduction(min : struct point : \\\n",
    "        minproc(&omp_out, &omp_in)) \\\n",
    " initializer( omp_priv = { INT_MAX, INT_MAX } )\n",
    "\n",
    "#pragma omp declare reduction(max : struct point : \\\n",
    "        maxproc(&omp_out, &omp_in)) \\\n",
    " initializer( omp_priv = { 0, 0 } )\n",
    "\n",
    "void find_enclosing_rectangle ( int n, struct point points[] )\n",
    "{\n",
    "  struct point minp = { INT_MAX, INT_MAX }, maxp = {0,0};\n",
    "  int i;\n",
    "\n",
    "#pragma omp parallel for reduction(min:minp) reduction(max:maxp)\n",
    "  for ( i = 0; i < n; i++ ) {\n",
    "     minproc(&minp, &points[i]);\n",
    "     maxproc(&maxp, &points[i]);\n",
    "  }\n",
    "  printf(\"min = (%d, %d)\\n\", minp.x, minp.y);\n",
    "  printf(\"max = (%d, %d)\\n\", maxp.x, maxp.y);\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following example shows the corresponding code in Fortran.  The __declare__ __reduction__ directives are specified as part of  the declaration in subroutine  _find_enclosing_rectangle_  and  the procedures that perform the min and max operations are specified as subprograms."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: udr.1\n",
    "! type: F-free\n",
    "! version: omp_4.0\n",
    "module data_type\n",
    "\n",
    "  type :: point\n",
    "    integer :: x\n",
    "    integer :: y\n",
    "  end type\n",
    "\n",
    "end module data_type\n",
    "\n",
    "subroutine find_enclosing_rectangle ( n, points )\n",
    "  use data_type\n",
    "  implicit none\n",
    "  integer :: n\n",
    "  type(point) :: points(*)\n",
    "\n",
    "  !$omp declare reduction(min : point : minproc(omp_out, omp_in)) &\n",
    "  !$omp&  initializer( omp_priv = point( HUGE(0), HUGE(0) ) )\n",
    "\n",
    "  !$omp declare reduction(max : point : maxproc(omp_out, omp_in)) &\n",
    "  !$omp&  initializer( omp_priv = point( 0, 0 ) )\n",
    "\n",
    "  type(point) :: minp = point( HUGE(0), HUGE(0) ), maxp = point( 0, 0 )\n",
    "  integer :: i\n",
    "\n",
    "  !$omp parallel do reduction(min:minp) reduction(max:maxp)\n",
    "  do i = 1, n\n",
    "     call minproc(minp, points(i))\n",
    "     call maxproc(maxp, points(i))\n",
    "  end do\n",
    "  print *, \"min = (\", minp%x, minp%y, \")\"\n",
    "  print *, \"max = (\", maxp%x, maxp%y, \")\"\n",
    "\n",
    " contains\n",
    "  subroutine minproc ( out, in )\n",
    "    implicit none\n",
    "    type(point), intent(inout) :: out\n",
    "    type(point), intent(in) :: in\n",
    "\n",
    "    out%x = min( out%x, in%x )\n",
    "    out%y = min( out%y, in%y )\n",
    "  end subroutine minproc\n",
    "\n",
    "  subroutine maxproc ( out, in )\n",
    "    implicit none\n",
    "    type(point), intent(inout) :: out\n",
    "    type(point), intent(in) :: in\n",
    "\n",
    "    out%x = max( out%x, in%x )\n",
    "    out%y = max( out%y, in%y )\n",
    "  end subroutine maxproc\n",
    "\n",
    "end subroutine"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following example shows the same computation as  _udr.1_  but it illustrates that you can craft complex expressions in the user-defined reduction declaration. In this case, instead of calling the  _minproc_  and  _maxproc_  functions we inline the code in a single expression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: udr.2\n",
    "* type: C\n",
    "* version: omp_4.0\n",
    "*/\n",
    "#include <stdio.h>\n",
    "#include <limits.h>\n",
    "\n",
    "struct point {\n",
    "  int x;\n",
    "  int y;\n",
    "};\n",
    "\n",
    "#pragma omp declare reduction(min : struct point : \\\n",
    "        omp_out.x = omp_in.x > omp_out.x  ? omp_out.x : omp_in.x, \\\n",
    "        omp_out.y = omp_in.y > omp_out.y  ? omp_out.y : omp_in.y ) \\\n",
    "        initializer( omp_priv = { INT_MAX, INT_MAX } )\n",
    "\n",
    "#pragma omp declare reduction(max : struct point : \\\n",
    "        omp_out.x = omp_in.x < omp_out.x  ? omp_out.x : omp_in.x,  \\\n",
    "        omp_out.y = omp_in.y < omp_out.y  ? omp_out.y : omp_in.y ) \\\n",
    "        initializer( omp_priv = { 0, 0 } )\n",
    "\n",
    "void find_enclosing_rectangle ( int n, struct point points[] )\n",
    "{\n",
    "  struct point minp = { INT_MAX, INT_MAX }, maxp = {0,0};\n",
    "  int i;\n",
    "\n",
    "#pragma omp parallel for reduction(min:minp) reduction(max:maxp)\n",
    "  for ( i = 0; i < n; i++ ) {\n",
    "    if ( points[i].x < minp.x ) minp.x = points[i].x;\n",
    "    if ( points[i].y < minp.y ) minp.y = points[i].y;\n",
    "    if ( points[i].x > maxp.x ) maxp.x = points[i].x;\n",
    "    if ( points[i].y > maxp.y ) maxp.y = points[i].y;\n",
    "  }\n",
    "  printf(\"min = (%d, %d)\\n\", minp.x, minp.y);\n",
    "  printf(\"max = (%d, %d)\\n\", maxp.x, maxp.y);\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The corresponding code of the same example in Fortran is very similar except that the assignment expression in the __declare__ __reduction__ directive can only be used for a single variable, in this case through a type structure constructor  _point( ... )_ ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: udr.2\n",
    "! type: F-free\n",
    "! version: omp_4.0\n",
    "module data_type\n",
    "\n",
    "  type :: point\n",
    "    integer :: x\n",
    "    integer :: y\n",
    "  end type\n",
    "\n",
    "end module data_type\n",
    "\n",
    "subroutine find_enclosing_rectangle ( n, points )\n",
    "  use data_type\n",
    "  implicit none\n",
    "  integer :: n\n",
    "  type(point) :: points(*)\n",
    "\n",
    "  !$omp declare reduction( min : point :  &\n",
    "  !$omp&   omp_out = point(min( omp_out%x, omp_in%x ), &\n",
    "  !$omp&                   min( omp_out%y, omp_in%y )) ) &\n",
    "  !$omp&   initializer( omp_priv = point( HUGE(0), HUGE(0) ) )\n",
    "\n",
    "  !$omp declare reduction( max : point :  &\n",
    "  !$omp&   omp_out = point(max( omp_out%x, omp_in%x ), &\n",
    "  !$omp&                   max( omp_out%y, omp_in%y )) ) &\n",
    "  !$omp&   initializer( omp_priv = point( 0, 0 ) )\n",
    "\n",
    "  type(point) :: minp = point( HUGE(0), HUGE(0) ), maxp = point( 0, 0 )\n",
    "  integer :: i\n",
    "\n",
    "  !$omp parallel do reduction(min: minp) reduction(max: maxp)\n",
    "  do i = 1, n\n",
    "     minp%x = min(minp%x, points(i)%x)\n",
    "     minp%y = min(minp%y, points(i)%y)\n",
    "     maxp%x = max(maxp%x, points(i)%x)\n",
    "     maxp%y = max(maxp%y, points(i)%y)\n",
    "  end do\n",
    "  print *, \"min = (\", minp%x, minp%y, \")\"\n",
    "  print *, \"max = (\", maxp%x, maxp%y, \")\"\n",
    "\n",
    "end subroutine"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following example shows the use of special variables in arguments for combiner (__omp_in__ and __omp_out__) and initializer (__omp_priv__ and __omp_orig__) routines.  This example returns the maximum value of an array and the corresponding index value. The __declare__ __reduction__ directive specifies a user-defined reduction operation  _maxloc_  for data type  _struct_   _mx_s_ . The function  _mx_combine_  is the combiner and the function  _mx_init_  is the initializer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: udr.3\n",
    "* type: C\n",
    "* version: omp_4.0\n",
    "*/\n",
    "\n",
    "#include <stdio.h>\n",
    "#define N 100\n",
    "\n",
    "struct mx_s {\n",
    "   float value;\n",
    "   int index;\n",
    "};\n",
    "\n",
    "/* prototype functions for combiner and initializer in\n",
    "   the declare reduction */\n",
    "void mx_combine(struct mx_s *out, struct mx_s *in);\n",
    "void mx_init(struct mx_s *priv, struct mx_s *orig);\n",
    "\n",
    "#pragma omp declare reduction(maxloc: struct mx_s: \\\n",
    "        mx_combine(&omp_out, &omp_in)) \\\n",
    "        initializer(mx_init(&omp_priv, &omp_orig))\n",
    "\n",
    "void mx_combine(struct mx_s *out, struct mx_s *in)\n",
    "{\n",
    "   if ( out->value < in->value ) {\n",
    "      out->value = in->value;\n",
    "      out->index = in->index;\n",
    "   }\n",
    "}\n",
    "\n",
    "void mx_init(struct mx_s *priv, struct mx_s *orig)\n",
    "{\n",
    "   priv->value = orig->value;\n",
    "   priv->index = orig->index;\n",
    "}\n",
    "\n",
    "int main(void)\n",
    "{\n",
    "   struct mx_s mx;\n",
    "   float val[N], d;\n",
    "   int i, count = N;\n",
    "\n",
    "   for (i = 0; i < count; i++) {\n",
    "      d = (N*0.8f - i);\n",
    "      val[i] = N * N - d * d;\n",
    "   }\n",
    "\n",
    "   mx.value = val[0];\n",
    "   mx.index = 0;\n",
    "   #pragma omp parallel for reduction(maxloc: mx)\n",
    "   for (i = 1; i < count; i++) {\n",
    "      if (mx.value < val[i])\n",
    "      {\n",
    "         mx.value = val[i];\n",
    "         mx.index = i;\n",
    "      }\n",
    "   }\n",
    "\n",
    "   printf(\"max value = %g, index = %d\\n\", mx.value, mx.index);\n",
    "   /* prints 10000, 80 */\n",
    "\n",
    "   return 0;\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below is the corresponding Fortran version of the above example.  The __declare__ __reduction__ directive specifies the user-defined operation  _maxloc_  for user-derived type  _mx_s_ .  The combiner  _mx_combine_  and the initializer  _mx_init_  are specified as subprograms."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: udr.3\n",
    "! type: F-free\n",
    "! version: omp_4.0\n",
    "program max_loc\n",
    "   implicit none\n",
    "\n",
    "   type :: mx_s\n",
    "      real value\n",
    "      integer index\n",
    "   end type\n",
    "\n",
    "   !$omp declare reduction(maxloc: mx_s: &\n",
    "   !$omp&        mx_combine(omp_out, omp_in)) &\n",
    "   !$omp&        initializer(mx_init(omp_priv, omp_orig))\n",
    "\n",
    "   integer, parameter :: N = 100\n",
    "   type(mx_s) :: mx\n",
    "   real :: val(N), d\n",
    "   integer :: i, count\n",
    "\n",
    "   count = N\n",
    "   do i = 1, count\n",
    "      d = N*0.8 - i + 1\n",
    "      val(i) = N * N - d * d\n",
    "   enddo\n",
    "\n",
    "   mx%value = val(1)\n",
    "   mx%index = 1\n",
    "   !$omp parallel do reduction(maxloc: mx)\n",
    "   do i = 2, count\n",
    "      if (mx%value < val(i)) then\n",
    "         mx%value = val(i)\n",
    "         mx%index = i\n",
    "      endif\n",
    "   enddo\n",
    "\n",
    "   print *, 'max value = ', mx%value, ' index = ', mx%index\n",
    "   ! prints 10000, 81\n",
    "\n",
    " contains\n",
    "\n",
    " subroutine mx_combine(out, in)\n",
    "   implicit none\n",
    "   type(mx_s), intent(inout) :: out\n",
    "   type(mx_s), intent(in) :: in\n",
    "\n",
    "   if ( out%value < in%value ) then\n",
    "      out%value = in%value\n",
    "      out%index = in%index\n",
    "   endif\n",
    " end subroutine mx_combine\n",
    "\n",
    " subroutine mx_init(priv, orig)\n",
    "   implicit none\n",
    "   type(mx_s), intent(out) :: priv\n",
    "   type(mx_s), intent(in) :: orig\n",
    "\n",
    "   priv%value = orig%value\n",
    "   priv%index = orig%index\n",
    " end subroutine mx_init\n",
    "\n",
    "end program"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following example explains a few details of the user-defined reduction  in Fortran through modules. The __declare__ __reduction__ directive is declared in a module ( _data_red_ ).  The reduction-identifier  _.add._  is a user-defined operator that is to allow accessibility in the scope that performs the reduction operation. The user-defined operator  _.add._  and the subroutine  _dt_init_  specified in the __initializer__ clause are defined in the same subprogram."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The reduction operation (that is, the __reduction__ clause) is in the main program. The reduction identifier  _.add._  is accessible by use association. Since  _.add._  is a user-defined operator, the explicit interface should also be accessible by use association in the current program unit. Since the __declare__ __reduction__ associated to this __reduction__ clause has the __initializer__ clause, the subroutine specified on the clause must be accessible in the current scoping unit.  In this case, the subroutine  _dt_init_  is accessible by use association."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!%compiler: gfortran\n",
    "!!%cflags: -fopenmp\n",
    "\n",
    "! name: udr.4\n",
    "! type: F-free\n",
    "! version: omp_4.0\n",
    "module data_red\n",
    "! Declare data type.\n",
    "  type dt\n",
    "    real :: r1\n",
    "    real :: r2\n",
    "  end type\n",
    "\n",
    "! Declare the user-defined operator .add.\n",
    "  interface operator(.add.)\n",
    "    module procedure addc\n",
    "  end interface\n",
    "\n",
    "! Declare the user-defined reduction operator .add.\n",
    "!$omp declare reduction(.add.:dt:omp_out=omp_out.add.omp_in) &\n",
    "!$omp& initializer(dt_init(omp_priv))\n",
    "\n",
    " contains\n",
    "! Declare the initialization routine.\n",
    "  subroutine dt_init(u)\n",
    "    type(dt) :: u\n",
    "    u%r1 = 0.0\n",
    "    u%r2 = 0.0\n",
    "  end subroutine\n",
    "\n",
    "! Declare the specific procedure for the .add. operator.\n",
    "  function addc(x1, x2) result(xresult)\n",
    "    type(dt), intent(in) :: x1, x2\n",
    "    type(dt) :: xresult\n",
    "    xresult%r1 = x1%r1 + x2%r2\n",
    "    xresult%r2 = x1%r2 + x2%r1\n",
    "  end function\n",
    "\n",
    "end module data_red\n",
    "\n",
    "program main\n",
    "  use data_red, only : dt, dt_init, operator(.add.)\n",
    "\n",
    "  type(dt) :: xdt1, xdt2\n",
    "  integer :: i\n",
    "\n",
    "  xdt1 = dt(1.0,2.0)\n",
    "  xdt2 = dt(2.0,3.0)\n",
    "\n",
    "! The reduction operation\n",
    "!$omp parallel do reduction(.add.: xdt1)\n",
    "  do i = 1, 10\n",
    "    xdt1 = xdt1 .add. xdt2\n",
    "  end do\n",
    "!$omp end parallel do\n",
    "\n",
    "  print *, xdt1\n",
    "\n",
    "end program"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following example uses user-defined reductions to declare a plus (+) reduction for a C++ class. As the __declare__ __reduction__ directive is inside the context of the  _V_  class the expressions in the __declare__ __reduction__ directive are resolved in the context of the class. Also, note that the __initializer__ clause uses a copy constructor to initialize the private variables of the reduction and it uses as parameter to its original variable by using the special variable __omp_orig__."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: udr.5\n",
    "* type: C++\n",
    "* version: omp_4.0\n",
    "*/\n",
    "class V {\n",
    "   float *p;\n",
    "   int n;\n",
    "\n",
    "public:\n",
    "   V( int _n )     : n(_n)  { p = new float[n]; }\n",
    "   V( const V& m ) : n(m.n) { p = new float[n]; }\n",
    "   ~V() { delete[] p; }\n",
    "\n",
    "   V& operator+= ( const V& );\n",
    "\n",
    "   #pragma omp declare reduction( + : V : omp_out += omp_in ) \\\n",
    "           initializer(omp_priv(omp_orig))\n",
    "};"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following examples shows how user-defined reductions can be defined for some STL containers. The first __declare__ __reduction__ defines the plus (+) operation for  _std::vector<int>_  by making use of the  _std::transform_  algorithm. The second and third define the merge (or concatenation) operation for  _std::vector<int>_  and  _std::list<int>_ .  It shows how the user-defined reduction operation can be applied to specific data types of an STL."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "//%compiler: clang\n",
    "//%cflags: -fopenmp\n",
    "\n",
    "/*\n",
    "* name: udr.6\n",
    "* type: C++\n",
    "* version: omp_4.0\n",
    "*/\n",
    "#include <algorithm>\n",
    "#include <list>\n",
    "#include <vector>\n",
    "\n",
    "#pragma omp declare reduction( + : std::vector<int> : \\\n",
    "     std::transform (omp_out.begin(), omp_out.end(),  \\\n",
    "        omp_in.begin(), omp_in.end(),std::plus<int>()))\n",
    "\n",
    "#pragma omp declare reduction( merge : std::vector<int> : \\\n",
    "     omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))\n",
    "\n",
    "#pragma omp declare reduction( merge : std::list<int> : \\\n",
    "     omp_out.merge(omp_in))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Native",
   "language": "native",
   "name": "native"
  },
  "language_info": {
   "file_extension": ".c",
   "mimetype": "text/plain",
   "name": "c"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}