{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Reduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This section covers ways to perform reductions in parallel, task, taskloop, and SIMD regions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### __reduction__ Clause" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following example demonstrates the __reduction__ clause; note that some reductions can be expressed in the loop in several ways, as shown for the __max__ and __min__ reductions below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: reduction.1\n", "* type: C\n", "* version: omp_3.1\n", "*/\n", "#include \n", "void reduction1(float *x, int *y, int n)\n", "{\n", " int i, b, c;\n", " float a, d;\n", " a = 0.0;\n", " b = 0;\n", " c = y[0];\n", " d = x[0];\n", " #pragma omp parallel for private(i) shared(x, y, n) \\\n", " reduction(+:a) reduction(^:b) \\\n", " reduction(min:c) reduction(max:d)\n", " for (i=0; i y[i]) c = y[i];\n", " d = fmaxf(d,x[i]);\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: reduction.1\n", "! type: F-free\n", "SUBROUTINE REDUCTION1(A, B, C, D, X, Y, N)\n", " REAL :: X(*), A, D\n", " INTEGER :: Y(*), N, B, C\n", " INTEGER :: I\n", " A = 0\n", " B = 0\n", " C = Y(1)\n", " D = X(1)\n", " !$OMP PARALLEL DO PRIVATE(I) SHARED(X, Y, N) REDUCTION(+:A) &\n", " !$OMP& REDUCTION(IEOR:B) REDUCTION(MIN:C) REDUCTION(MAX:D)\n", " DO I=1,N\n", " A = A + X(I)\n", " B = IEOR(B, Y(I))\n", " C = MIN(C, Y(I))\n", " IF (D < X(I)) D = X(I)\n", " END DO\n", "\n", "END SUBROUTINE REDUCTION1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A common implementation of the preceding example is to treat it as if it had been written as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: reduction.2\n", "* type: C\n", "*/\n", "#include \n", "#include \n", "void reduction2(float *x, int *y, int n)\n", "{\n", " int i, b, b_p, c, c_p;\n", " float a, a_p, d, d_p;\n", " a = 0.0f;\n", " b = 0;\n", " c = y[0];\n", " d = x[0];\n", " #pragma omp parallel shared(a, b, c, d, x, y, n) \\\n", " private(a_p, b_p, c_p, d_p)\n", " {\n", " a_p = 0.0f;\n", " b_p = 0;\n", " c_p = INT_MAX;\n", " d_p = -HUGE_VALF;\n", " #pragma omp for private(i)\n", " for (i=0; i y[i]) c_p = y[i];\n", " d_p = fmaxf(d_p,x[i]);\n", " }\n", " #pragma omp critical\n", " {\n", " a += a_p;\n", " b ^= b_p;\n", " if( c > c_p ) c = c_p;\n", " d = fmaxf(d,d_p);\n", " }\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: reduction.2\n", "! type: F-free\n", " SUBROUTINE REDUCTION2(A, B, C, D, X, Y, N)\n", " REAL :: X(*), A, D\n", " INTEGER :: Y(*), N, B, C\n", " REAL :: A_P, D_P\n", " INTEGER :: I, B_P, C_P\n", " A = 0\n", " B = 0\n", " C = Y(1)\n", " D = X(1)\n", " !$OMP PARALLEL SHARED(X, Y, A, B, C, D, N) &\n", " !$OMP& PRIVATE(A_P, B_P, C_P, D_P)\n", " A_P = 0.0\n", " B_P = 0\n", " C_P = HUGE(C_P)\n", " D_P = -HUGE(D_P)\n", " !$OMP DO PRIVATE(I)\n", " DO I=1,N\n", " A_P = A_P + X(I)\n", " B_P = IEOR(B_P, Y(I))\n", " C_P = MIN(C_P, Y(I))\n", " IF (D_P < X(I)) D_P = X(I)\n", " END DO\n", " !$OMP CRITICAL\n", " A = A + A_P\n", " B = IEOR(B, B_P)\n", " C = MIN(C, C_P)\n", " D = MAX(D, D_P)\n", " !$OMP END CRITICAL\n", " !$OMP END PARALLEL\n", " END SUBROUTINE REDUCTION2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following program is non-conforming because the reduction is on the **intrinsic procedure name** __MAX__ but that name has been redefined to be the variable named __MAX__." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: reduction.3\n", "! type: F-free\n", " PROGRAM REDUCTION_WRONG\n", " MAX = HUGE(0)\n", " M = 0\n", "\n", " !$OMP PARALLEL DO REDUCTION(MAX: M)\n", "! MAX is no longer the intrinsic so this is non-conforming\n", " DO I = 1, 100\n", " CALL SUB(M,I)\n", " END DO\n", "\n", " END PROGRAM REDUCTION_WRONG\n", "\n", " SUBROUTINE SUB(M,I)\n", " M = MAX(M,I)\n", " END SUBROUTINE SUB" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following conforming program performs the reduction using the **intrinsic procedure name** __MAX__ even though the intrinsic __MAX__ has been renamed to __REN__." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: reduction.4\n", "! type: F-free\n", "MODULE M\n", " INTRINSIC MAX\n", "END MODULE M\n", "\n", "PROGRAM REDUCTION3\n", " USE M, REN => MAX\n", " N = 0\n", "!$OMP PARALLEL DO REDUCTION(REN: N) ! still does MAX\n", " DO I = 1, 100\n", " N = MAX(N,I)\n", " END DO\n", "END PROGRAM REDUCTION3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following conforming program performs the reduction using _intrinsic procedure name_ __MAX__ even though the intrinsic __MAX__ has been renamed to __MIN__." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: reduction.5\n", "! type: F-free\n", "MODULE MOD\n", " INTRINSIC MAX, MIN\n", "END MODULE MOD\n", "\n", "PROGRAM REDUCTION4\n", " USE MOD, MIN=>MAX, MAX=>MIN\n", " REAL :: R\n", " R = -HUGE(0.0)\n", "\n", "!$OMP PARALLEL DO REDUCTION(MIN: R) ! still does MAX\n", " DO I = 1, 1000\n", " R = MIN(R, SIN(REAL(I)))\n", " END DO\n", " PRINT *, R\n", "END PROGRAM REDUCTION4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following example is non-conforming because the initialization (__a = 0__) of the original list item __a__ is not synchronized with the update of __a__ as a result of the reduction computation in the __for__ loop. Therefore, the example may print an incorrect value for __a__." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To avoid this problem, the initialization of the original list item __a__ should complete before any update of __a__ as a result of the __reduction__ clause. This can be achieved by adding an explicit barrier after the assignment __a = 0__, or by enclosing the assignment __a = 0__ in a __single__ directive (which has an implied barrier), or by initializing __a__ before the start of the __parallel__ region." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: reduction.6\n", "* type: C\n", "* version: omp_5.1\n", "*/\n", "#include \n", "\n", "int main (void)\n", "{\n", " int a, i;\n", "\n", " #pragma omp parallel shared(a) private(i)\n", " {\n", " #pragma omp masked\n", " a = 0;\n", "\n", " // To avoid race conditions, add a barrier here.\n", "\n", " #pragma omp for reduction(+:a)\n", " for (i = 0; i < 10; i++) {\n", " a += i;\n", " }\n", "\n", " #pragma omp single\n", " printf (\"Sum is %d\\n\", a);\n", " }\n", " return 0;\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: reduction.6\n", "! type: F-fixed\n", "! version: omp_5.1\n", " INTEGER A, I\n", "\n", "!$OMP PARALLEL SHARED(A) PRIVATE(I)\n", "\n", "!$OMP MASKED\n", " A = 0\n", "!$OMP END MASKED\n", "\n", " ! To avoid race conditions, add a barrier here.\n", "\n", "!$OMP DO REDUCTION(+:A)\n", " DO I= 0, 9\n", " A = A + I\n", " END DO\n", "\n", "!$OMP SINGLE\n", " PRINT *, \"Sum is \", A\n", "!$OMP END SINGLE\n", "\n", "!$OMP END PARALLEL\n", "\n", " END" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following example demonstrates the reduction of array _a_ . In C/C++ this is illustrated by the explicit use of an array section _a[0:N]_ in the __reduction__ clause. The corresponding Fortran example uses array syntax supported in the base language. As of the OpenMP 4.5 specification the explicit use of array section in the __reduction__ clause in Fortran is not permitted. But this oversight has been fixed in the OpenMP 5.0 specification." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: reduction.7\n", "* type: C\n", "* version: omp_4.5\n", "*/\n", "#include \n", "\n", "#define N 100\n", "void init(int n, float (*b)[N]);\n", "\n", "int main(){\n", "\n", " int i,j;\n", " float a[N], b[N][N];\n", "\n", " init(N,b);\n", "\n", " for(i=0; i\n", "#include\n", "#define N 10\n", "\n", "typedef struct node_tag {\n", " int val;\n", " struct node_tag *next;\n", "} node_t;\n", "\n", "int linked_list_sum(node_t *p)\n", "{\n", " int res = 0;\n", "\n", " #pragma omp taskgroup task_reduction(+: res)\n", " {\n", " node_t* aux = p;\n", " while(aux != 0)\n", " {\n", " #pragma omp task in_reduction(+: res)\n", " res += aux->val;\n", "\n", " aux = aux->next;\n", " }\n", " }\n", " return res;\n", "}\n", "\n", "\n", "int main(int argc, char *argv[]) {\n", " int i;\n", "// Create the root node.\n", " node_t* root = (node_t*) malloc(sizeof(node_t));\n", " root->val = 1;\n", "\n", " node_t* aux = root;\n", "\n", "// Create N-1 more nodes.\n", " for(i=2;i<=N;++i){\n", " aux->next = (node_t*) malloc(sizeof(node_t));\n", " aux = aux->next;\n", " aux->val = i;\n", " }\n", "\n", " aux->next = 0;\n", "\n", " #pragma omp parallel\n", " #pragma omp single\n", " {\n", " int result = linked_list_sum(root);\n", " printf( \"Calculated: %d Analytic:%d\\n\", result, (N*(N+1)/2) );\n", " }\n", "\n", " return 0;\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: task_reduction.1\n", "! type: F-free\n", "\n", "module m\n", " type node_t\n", " integer :: val\n", " type(node_t), pointer :: next\n", " end type\n", "end module m\n", "\n", "function linked_list_sum(p) result(res)\n", " use m\n", " implicit none\n", " type(node_t), pointer :: p\n", " type(node_t), pointer :: aux\n", " integer :: res\n", "\n", " res = 0\n", "\n", " !$omp taskgroup task_reduction(+: res)\n", " aux => p\n", " do while (associated(aux))\n", " !$omp task in_reduction(+: res)\n", " res = res + aux%val\n", " !$omp end task\n", " aux => aux%next\n", " end do\n", " !$omp end taskgroup\n", "end function linked_list_sum\n", "\n", "\n", "program main\n", " use m\n", " implicit none\n", " type(node_t), pointer :: root, aux\n", " integer :: res, i\n", " integer, parameter :: N=10\n", "\n", " interface\n", " function linked_list_sum(p) result(res)\n", " use m\n", " implicit none\n", " type(node_t), pointer :: p\n", " integer :: res\n", " end function\n", " end interface\n", "! Create the root node.\n", " allocate(root)\n", " root%val = 1\n", " aux => root\n", "\n", "! Create N-1 more nodes.\n", " do i = 2,N\n", " allocate(aux%next)\n", " aux => aux%next\n", " aux%val = i\n", " end do\n", "\n", " aux%next => null()\n", "\n", " !$omp parallel\n", " !$omp single\n", " res = linked_list_sum(root)\n", " print *, \"Calculated:\", res, \" Analytic:\", (N*(N+1))/2\n", " !$omp end single\n", " !$omp end parallel\n", "\n", "end program main" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In OpenMP 5.0 the __task__ _reduction-modifier_ for the __reduction__ clause was introduced to provide a means of performing reductions among implicit and explicit tasks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The __reduction__ clause of a __parallel__ or worksharing construct may specify the __task__ _reduction-modifier_ to include explicit task reductions within their region, provided the reduction operators ( _reduction-identifiers_ ) and variables ( _list items_ ) of the participating tasks match those of the implicit tasks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 2 reduction use cases (identified by USE CASE #) in the _task_reduction.2_ example below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In USE CASE 1 a __task__ modifier in the __reduction__ clause of the __parallel__ construct is used to include the reductions of any participating tasks, those with an __in_reduction__ clause and matching _reduction-identifiers_ (__+__) and list items (__x__)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note, a __taskgroup__ construct (with a __task_reduction__ clause) in not necessary to scope the explicit task reduction (as seen in the example above). Hence, even without the implicit task reduction statement (without the C __x++__ and Fortran __x=x+1__ statements), the __task__ _reduction-modifier_ in a __reduction__ clause of the __parallel__ construct can be used to avoid having to create a __taskgroup__ construct (and its __task_reduction__ clause) around the task generating structure." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In USE CASE 2 tasks participating in the reduction are within a worksharing region (a parallel worksharing-loop construct). Here, too, no __taskgroup__ is required, and the _reduction-identifier_ (__+__) and list item (variable __x__) match as required." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: task_reduction.2\n", "* type: C\n", "* version: omp_5.0\n", "*/\n", "#include \n", "int main(void){\n", " int N=100, M=10;\n", " int i, x;\n", "\n", "// USE CASE 1 explicit-task reduction + parallel reduction clause\n", " x=0;\n", " #pragma omp parallel num_threads(M) reduction(task,+:x)\n", " {\n", "\n", " x++; // implicit task reduction statement\n", "\n", " #pragma omp single\n", " for(i=0;i\n", "int f(int);\n", "int g(int);\n", "int main()\n", "{\n", " int sum1=0, sum2=0;\n", " int i;\n", " const int n = 100;\n", "\n", " #pragma omp target teams distribute reduction(+:sum1)\n", " for (int i = 0; i < n; i++) {\n", " sum1 += f(i);\n", " }\n", "\n", " #pragma omp target teams distribute reduction(+:sum2)\n", " for (int i = 0; i < n; i++) {\n", " sum2 += g(i) * sum1;\n", " }\n", "\n", " printf( \"sum1 = %d, sum2 = %d\\n\", sum1, sum2);\n", " //OUTPUT: sum1 = 9900, sum2 = 147015000\n", " return 0;\n", "}\n", "\n", "int f(int res){ return res*2; }\n", "int g(int res){ return res*3; }" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: target_reduction.1\n", "! type: F-free\n", "! version: omp_5.0\n", "program target_reduction_ex1\n", " interface\n", " function f(res)\n", " integer :: f, res\n", " end function\n", " function g(res)\n", " integer :: g, res\n", " end function\n", " end interface\n", " integer :: sum1, sum2, i\n", " integer, parameter :: n = 100\n", " sum1 = 0\n", " sum2 = 0\n", " !$omp target teams distribute reduction(+:sum1)\n", " do i=1,n\n", " sum1 = sum1 + f(i)\n", " end do\n", " !$omp target teams distribute reduction(+:sum2)\n", " do i=1,n\n", " sum2 = sum2 + g(i)*sum1\n", " end do\n", " print *, \"sum1 = \", sum1, \", sum2 = \", sum2\n", " !!OUTPUT: sum1 = 10100 , sum2 = 153015000\n", "end program\n", "\n", "\n", "integer function f(res)\n", " integer :: res\n", " f = res*2\n", "end function\n", "integer function g(res)\n", " integer :: res\n", " g = res*3\n", "end function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In next example, the variables __sum1__ and __sum2__ remain on the device for the duration of the __target__ __data__ region so that it is their device copies that are updated by the reductions. Note the significance of mapping __sum1__ on the second __target__ construct; otherwise, it would be treated by default as firstprivate and the result computed for __sum1__ in the prior __target__ region may not be used. Alternatively, a __target__ __update__ construct could be used between the two __target__ constructs to update the host version of __sum1__ with the value that is in the corresponding device version after the completion of the first construct." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: target_reduction.2\n", "* type: C\n", "* version: omp_5.0\n", "*/\n", "#include \n", "int f(int);\n", "int g(int);\n", "int main()\n", "{\n", " int sum1=0, sum2=0;\n", " int i;\n", " const int n = 100;\n", "\n", " #pragma omp target data map(sum1,sum2)\n", " {\n", " #pragma omp target teams distribute reduction(+:sum1)\n", " for (int i = 0; i < n; i++) {\n", " sum1 += f(i);\n", " }\n", "\n", " #pragma omp target teams distribute map(sum1) reduction(+:sum2)\n", " for (int i = 0; i < n; i++) {\n", " sum2 += g(i) * sum1;\n", " }\n", " }\n", " printf( \"sum1 = %d, sum2 = %d\\n\", sum1, sum2);\n", " //OUTPUT: sum1 = 9900, sum2 = 147015000\n", " return 0;\n", "}\n", "\n", "int f(int res){ return res*2; }\n", "int g(int res){ return res*3; }" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: target_reduction.2\n", "! type: F-free\n", "! version: omp_5.0\n", "\n", "program target_reduction_ex2\n", " interface\n", " function f(res)\n", " integer :: f, res\n", " end function\n", " function g(res)\n", " integer :: g, res\n", " end function\n", " end interface\n", " integer :: sum1, sum2, i\n", " integer, parameter :: n = 100\n", " sum1 = 0\n", " sum2 = 0\n", " !$omp target data map(sum1, sum2)\n", " !$omp target teams distribute reduction(+:sum1)\n", " do i=1,n\n", " sum1 = sum1 + f(i)\n", " end do\n", " !$omp target teams distribute map(sum1) reduction(+:sum2)\n", " do i=1,n\n", " sum2 = sum2 + g(i)*sum1\n", " end do\n", " !$omp end target data\n", " print *, \"sum1 = \", sum1, \", sum2 = \", sum2\n", " !!OUTPUT: sum1 = 10100 , sum2 = 153015000\n", "end program\n", "\n", "\n", "integer function f(res)\n", " integer :: res\n", " f = res*2\n", "end function\n", "integer function g(res)\n", " integer :: res\n", " g = res*3\n", "end function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task Reduction with Target Constructs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following examples illustrate how task reductions can apply to target tasks that result from a __target__ construct with the __in_reduction__ clause. Here, the __in_reduction__ clause specifies that the target task participates in the task reduction defined in the scope of the enclosing __taskgroup__ construct. Partial results from all tasks participating in the task reduction will be combined (in some order) into the original variable listed in the __task_reduction__ clause before exiting the __taskgroup__ region." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: target_task_reduction.1\n", "* type: C\n", "* version: omp_5.2\n", "*/\n", "\n", "#include \n", "#pragma omp declare target enter(device_compute)\n", "void device_compute(int *);\n", "void host_compute(int *);\n", "int main()\n", "{\n", " int sum = 0;\n", "\n", " #pragma omp parallel masked\n", " #pragma omp taskgroup task_reduction(+:sum)\n", " {\n", " #pragma omp target in_reduction(+:sum) nowait\n", " device_compute(&sum);\n", "\n", " #pragma omp task in_reduction(+:sum)\n", " host_compute(&sum);\n", " }\n", " printf( \"sum = %d\\n\", sum);\n", " //OUTPUT: sum = 2\n", " return 0;\n", "}\n", "\n", "void device_compute(int *sum){ *sum = 1; }\n", "void host_compute(int *sum){ *sum = 1; }" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: target_task_reduction.1\n", "! type: F-free\n", "! version: omp_5.2\n", "\n", "program target_task_reduction_ex1\n", " interface\n", " subroutine device_compute(res)\n", " !$omp declare target enter(device_compute)\n", " integer :: res\n", " end subroutine device_compute\n", " subroutine host_compute(res)\n", " integer :: res\n", " end subroutine host_compute\n", " end interface\n", " integer :: sum\n", " sum = 0\n", " !$omp parallel masked\n", " !$omp taskgroup task_reduction(+:sum)\n", " !$omp target in_reduction(+:sum) nowait\n", " call device_compute(sum)\n", " !$omp end target\n", " !$omp task in_reduction(+:sum)\n", " call host_compute(sum)\n", " !$omp end task\n", " !$omp end taskgroup\n", " !$omp end parallel masked\n", " print *, \"sum = \", sum\n", " !!OUTPUT: sum = 2\n", "end program\n", "\n", "subroutine device_compute(sum)\n", " integer :: sum\n", " sum = 1\n", "end subroutine\n", "subroutine host_compute(sum)\n", " integer :: sum\n", " sum = 1\n", "end subroutine" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next pair of examples, the task reduction is defined by a __reduction__ clause with the __task__ modifier, rather than a __task_reduction__ clause on a __taskgroup__ construct. Again, the partial results from the participating tasks will be combined in some order into the original reduction variable, __sum__." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: target_task_reduction.2a\n", "* type: C\n", "* version: omp_5.2\n", "*/\n", "#include \n", "#pragma omp declare target enter(device_compute)\n", "extern void device_compute(int *);\n", "extern void host_compute(int *);\n", "int main()\n", "{\n", " int sum = 0;\n", "\n", " #pragma omp parallel sections reduction(task, +:sum)\n", " {\n", " #pragma omp section\n", " {\n", " #pragma omp target in_reduction(+:sum)\n", " device_compute(&sum);\n", " }\n", " #pragma omp section\n", " {\n", " host_compute(&sum);\n", " }\n", " }\n", " printf( \"sum = %d\\n\", sum);\n", " //OUTPUT: sum = 2\n", " return 0;\n", "}\n", "\n", "void device_compute(int *sum){ *sum = 1; }\n", "void host_compute(int *sum){ *sum = 1; }" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: target_task_reduction.2a\n", "! type: F-free\n", "! version: omp_5.2\n", "\n", "program target_task_reduction_ex2\n", " interface\n", " subroutine device_compute(res)\n", " !$omp declare target enter(device_compute)\n", " integer :: res\n", " end subroutine device_compute\n", " subroutine host_compute(res)\n", " integer :: res\n", " end subroutine host_compute\n", " end interface\n", " integer :: sum\n", " sum = 0\n", " !$omp parallel sections reduction(task,+:sum)\n", " !$omp section\n", " !$omp target in_reduction(+:sum) nowait\n", " call device_compute(sum)\n", " !$omp end target\n", " !$omp section\n", " call host_compute(sum)\n", " !$omp end parallel sections\n", " print *, \"sum = \", sum\n", " !!OUTPUT: sum = 2\n", "end program\n", "\n", "subroutine device_compute(sum)\n", " integer :: sum\n", " sum = 1\n", "end subroutine\n", "subroutine host_compute(sum)\n", " integer :: sum\n", " sum = 1\n", "end subroutine" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, the __task__ modifier is again used to define a task reduction over participating tasks. This time, the participating tasks are a target task resulting from a __target__ construct with the __in_reduction__ clause, and the implicit task (executing on the primary thread) that calls __host_compute__. As before, the partial results from these participating tasks are combined in some order into the original reduction variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: target_task_reduction.2b\n", "* type: C\n", "* version: omp_5.2\n", "*/\n", "#include \n", "#pragma omp declare target enter(device_compute)\n", "extern void device_compute(int *);\n", "extern void host_compute(int *);\n", "int main()\n", "{\n", " int sum = 0;\n", "\n", " #pragma omp parallel masked reduction(task, +:sum)\n", " {\n", " #pragma omp target in_reduction(+:sum) nowait\n", " device_compute(&sum);\n", "\n", " host_compute(&sum);\n", " }\n", " printf( \"sum = %d\\n\", sum);\n", " //OUTPUT: sum = 2\n", " return 0;\n", "}\n", "\n", "void device_compute(int *sum){ *sum = 1; }\n", "void host_compute(int *sum){ *sum = 1; }" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: target_task_reduction.2b\n", "! type: F-free\n", "! version: omp_5.2\n", "\n", "program target_task_reduction_ex2b\n", " interface\n", " subroutine device_compute(res)\n", " !$omp declare target enter(device_compute)\n", " integer :: res\n", " end subroutine device_compute\n", " subroutine host_compute(res)\n", " integer :: res\n", " end subroutine host_compute\n", " end interface\n", " integer :: sum\n", " sum = 0\n", " !$omp parallel masked reduction(task,+:sum)\n", " !$omp target in_reduction(+:sum) nowait\n", " call device_compute(sum)\n", " !$omp end target\n", " call host_compute(sum)\n", " !$omp end parallel masked\n", " print *, \"sum = \", sum\n", " !!OUTPUT: sum = 2\n", "end program\n", "\n", "\n", "subroutine device_compute(sum)\n", " integer :: sum\n", " sum = 1\n", "end subroutine\n", "subroutine host_compute(sum)\n", " integer :: sum\n", " sum = 1\n", "end subroutine" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Taskloop Reduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the OpenMP 5.0 Specification the __taskloop__ construct was extended to include the reductions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following two examples show how to implement a reduction over an array using taskloop reduction in two different ways. In the first example we apply the __reduction__ clause to the __taskloop__ construct. As it was explained above in the task reduction examples, a reduction over tasks is divided in two components: the scope of the reduction, which is defined by a __taskgroup__ region, and the tasks that participate in the reduction. In this example, the __reduction__ clause defines both semantics. First, it specifies that the implicit __taskgroup__ region associated with the __taskloop__ construct is the scope of the reduction, and second, it defines all tasks created by the __taskloop__ construct as participants of the reduction. About the first property, it is important to note that if we add the __nogroup__ clause to the __taskloop__ construct the code will be nonconforming, basically because we have a set of tasks that participate in a reduction that has not been defined." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: taskloop_reduction.1\n", "* type: C\n", "* version: omp_5.0\n", "*/\n", "#include \n", "\n", "int array_sum(int n, int *v) {\n", " int i;\n", " int res = 0;\n", "\n", " #pragma omp taskloop reduction(+: res)\n", " for(i = 0; i < n; ++i)\n", " res += v[i];\n", "\n", " return res;\n", "}\n", "\n", "int main(int argc, char *argv[]) {\n", " int n = 10;\n", " int v[10] = {1,2,3,4,5,6,7,8,9,10};\n", "\n", " #pragma omp parallel\n", " #pragma omp single\n", " {\n", " int res = array_sum(n, v);\n", " printf(\"The result is %d\\n\", res);\n", " }\n", " return 0;\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: taskloop_reduction.1\n", "! type: F-free\n", "! version: omp_5.0\n", "function array_sum(n, v) result(res)\n", " implicit none\n", " integer :: n, v(n), res\n", " integer :: i\n", "\n", " res = 0\n", " !$omp taskloop reduction(+: res)\n", " do i=1, n\n", " res = res + v(i)\n", " end do\n", " !$omp end taskloop\n", "\n", "end function array_sum\n", "\n", "program main\n", " implicit none\n", " integer :: n, v(10), res\n", " integer :: i\n", "\n", " integer, external :: array_sum\n", "\n", " n = 10\n", " do i=1, n\n", " v(i) = i\n", " end do\n", "\n", " !$omp parallel\n", " !$omp single\n", " res = array_sum(n, v)\n", " print *, \"The result is\", res\n", " !$omp end single\n", " !$omp end parallel\n", "end program main" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The second example computes exactly the same value as in the preceding _taskloop_reduction.1_ code section, but in a very different way. First, in the _array_sum_ function a __taskgroup__ region is created that defines the scope of a new reduction using the __task_reduction__ clause. After that, a task and also the tasks generated by a taskloop participate in that reduction by using the __in_reduction__ clause on the __task__ and __taskloop__ constructs, respectively. Note that the __nogroup__ clause was added to the __taskloop__ construct. This is allowed because what is expressed with the __in_reduction__ clause is different from what is expressed with the __reduction__ clause. In one case the generated tasks are specified to participate in a previously declared reduction (__in_reduction__ clause) whereas in the other case creation of a new reduction is specified and also all tasks generated by the taskloop will participate on it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: taskloop_reduction.2\n", "* type: C\n", "* version: omp_5.0\n", "*/\n", "#include \n", "\n", "int array_sum(int n, int *v) {\n", " int i;\n", " int res = 0;\n", "\n", " #pragma omp taskgroup task_reduction(+: res)\n", " {\n", " if (n > 0) {\n", " #pragma omp task in_reduction(+: res)\n", " res = res + v[0];\n", "\n", " #pragma omp taskloop in_reduction(+: res) nogroup\n", " for(i = 1; i < n; ++i)\n", " res += v[i];\n", " }\n", " }\n", "\n", " return res;\n", "}\n", "\n", "int main(int argc, char *argv[]) {\n", " int n = 10;\n", " int v[10] = {1,2,3,4,5,6,7,8,9,10};\n", "\n", " #pragma omp parallel\n", " #pragma omp single\n", " {\n", " int res = array_sum(n, v);\n", " printf(\"The result is %d\\n\", res);\n", " }\n", " return 0;\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: taskloop_reduction.2\n", "! type: F-free\n", "! version: omp_5.0\n", "function array_sum(n, v) result(res)\n", " implicit none\n", " integer :: n, v(n), res\n", " integer :: i\n", "\n", " res = 0\n", " !$omp taskgroup task_reduction(+: res)\n", " if (n > 0) then\n", " !$omp task in_reduction(+: res)\n", " res = res + v(1)\n", " !$omp end task\n", "\n", " !$omp taskloop in_reduction(+: res) nogroup\n", " do i=2, n\n", " res = res + v(i)\n", " end do\n", " !$omp end taskloop\n", " endif\n", " !$omp end taskgroup\n", "\n", "end function array_sum\n", "\n", "program main\n", " implicit none\n", " integer :: n, v(10), res\n", " integer :: i\n", "\n", " integer, external :: array_sum\n", "\n", " n = 10\n", " do i=1, n\n", " v(i) = i\n", " end do\n", "\n", " !$omp parallel\n", " !$omp single\n", " res = array_sum(n, v)\n", " print *, \"The result is\", res\n", " !$omp end single\n", " !$omp end parallel\n", "end program main" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the OpenMP 5.0 Specification, __reduction__ clauses for the __taskloop simd__ construct were also added." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The examples below compare reductions for the __taskloop__ and the __taskloop__ __simd__ constructs. These examples illustrate the use of __reduction__ clauses within \"stand-alone\" __taskloop__ constructs, and the use of __in_reduction__ clauses for tasks of taskloops to participate with other reductions within the scope of a parallel region." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**taskloop reductions:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the _taskloop reductions_ section of the example below, _taskloop 1_ uses the __reduction__ clause in a __taskloop__ construct for a sum reduction, accumulated in _asum_ . The behavior is as though a __taskgroup__ construct encloses the taskloop region with a __task_reduction__ clause, and each taskloop task has an __in_reduction__ clause with the specifications of the __reduction__ clause. At the end of the taskloop region _asum_ contains the result of the reduction." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next taskloop, _taskloop 2_ , illustrates the use of the __in_reduction__ clause to participate in a previously defined reduction scope of a __parallel__ construct." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The task reductions of _task 2_ and _taskloop 2_ are combined across the __taskloop__ construct and the single __task__ construct, as specified in the __reduction(task,__ __+:asum)__ clause of the __parallel__ construct. At the end of the parallel region _asum_ contains the combined result of all reductions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**taskloop simd reductions:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reductions for the __taskloop__ __simd__ construct are shown in the second half of the code. Since each component construct, __taskloop__ and __simd__, can accept a reduction-type clause, the __taskloop__ __simd__ construct is a composite construct, and the specific application of the reduction clause is defined within the __taskloop__ __simd__ construct section of the OpenMP 5.0 Specification. The code below illustrates use cases for these reductions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the _taskloop simd reduction_ section of the example below, _taskloop simd 3_ uses the __reduction__ clause in a __taskloop__ __simd__ construct for a sum reduction within a loop. For this case a __reduction__ clause is used, as one would use for a __simd__ construct. The SIMD reductions of each task are combined, and the results of these tasks are further combined just as in the __taskloop__ construct with the __reduction__ clause for _taskloop 1_ . At the end of the taskloop region _asum_ contains the combined result of all reductions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If a __taskloop__ __simd__ construct is to participate in a previously defined reduction scope, the reduction participation should be specified with a __in_reduction__ clause, as shown in the __parallel__ region enclosing _task 4_ and _taskloop simd 4_ code sections." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here the __taskloop__ __simd__ construct's __in_reduction__ clause specifies participation of the construct's tasks as a task reduction within the scope of the parallel region. That is, the results of each task of the __taskloop__ construct component contribute to the reduction in a broader level, just as in _parallel reduction a_ code section above. Also, each __simd__-component construct occurs as if it has a __reduction__ clause, and the SIMD results of each task are combined as though to form a single result for each task (that participates in the __in_reduction__ clause). At the end of the parallel region _asum_ contains the combined result of all reductions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: taskloop_simd_reduction.1\n", "* type: C\n", "* version: omp_5.1\n", "*/\n", "#include \n", "#define N 100\n", "\n", "int main(){\n", " int i, a[N], asum=0;\n", "\n", " for(i=0;i\n", "void do_work(int n, float a[], float &s)\n", "{\n", " float loc_s = 0.0f; // local sum\n", " static int nthrs;\n", " #pragma omp for\n", " for (int i = 0; i < n; i++)\n", " loc_s += a[i];\n", " #pragma omp single\n", " {\n", " s = 0.0f; // total sum\n", " nthrs = 0;\n", " }\n", " #pragma omp scope reduction(+:s,nthrs)\n", " {\n", " s += loc_s;\n", " nthrs++;\n", " }\n", " #pragma omp masked\n", " printf(\"total sum = %f, nthrs = %d\\n\", s, nthrs);\n", "}\n", "\n", "float work(int n, float a[])\n", "{\n", " float s;\n", " #pragma omp parallel\n", " {\n", " do_work(n, a, s);\n", " }\n", " return s;\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: scope_reduction.1\n", "! type: F-free\n", "! version: omp_5.1\n", "subroutine do_work(n, a, s)\n", " implicit none\n", " integer n, i\n", " real a(*), s, loc_s\n", " integer, save :: nthrs\n", "\n", " loc_s = 0.0 ! local sum\n", " !$omp do\n", " do i = 1, n\n", " loc_s = loc_s + a(i)\n", " end do\n", " !$omp single\n", " s = 0.0 ! total sum\n", " nthrs = 0\n", " !$omp end single\n", " !$omp scope reduction(+:s,nthrs)\n", " s = s + loc_s\n", " nthrs = nthrs + 1\n", " !$omp end scope\n", " !$omp masked\n", " print *, \"total sum = \", s, \", nthrs = \", nthrs\n", " !$omp end masked\n", "end subroutine\n", "\n", "function work(n, a) result(s)\n", " implicit none\n", " integer n\n", " real a(*), s\n", "\n", " !$omp parallel\n", " call do_work(n, a, s)\n", " !$omp end parallel\n", "end function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### User-Defined Reduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The __declare__ __reduction__ directive can be used to specify user-defined reductions (UDR) for user data types." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following example, __declare__ __reduction__ directives are used to define _min_ and _max_ operations for the _point_ data structure for computing the rectangle that encloses a set of 2-D points." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each __declare__ __reduction__ directive defines new reduction identifiers, _min_ and _max_ , to be used in a __reduction__ clause. The next item in the declaration list is the data type ( _struct_ _point_ ) used in the reduction, followed by the combiner, here the functions _minproc_ and _maxproc_ perform the min and max operations, respectively, on the user data (of type _struct_ _point_ ). In the function argument list are two special OpenMP variable identifiers, __omp_in__ and __omp_out__, that denote the two values to be combined in the \"real\" function; the __omp_out__ identifier indicates which one is to hold the result." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The initializer of the __declare__ __reduction__ directive specifies the initial value for the private variable of each implicit task. The __omp_priv__ identifier is used to denote the private variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: udr.1\n", "* type: C\n", "* version: omp_4.0\n", "*/\n", "#include \n", "#include \n", "\n", "struct point {\n", " int x;\n", " int y;\n", "};\n", "\n", "void minproc ( struct point *out, struct point *in )\n", "{\n", " if ( in->x < out->x ) out->x = in->x;\n", " if ( in->y < out->y ) out->y = in->y;\n", "}\n", "\n", "void maxproc ( struct point *out, struct point *in )\n", "{\n", " if ( in->x > out->x ) out->x = in->x;\n", " if ( in->y > out->y ) out->y = in->y;\n", "}\n", "\n", "#pragma omp declare reduction(min : struct point : \\\n", " minproc(&omp_out, &omp_in)) \\\n", " initializer( omp_priv = { INT_MAX, INT_MAX } )\n", "\n", "#pragma omp declare reduction(max : struct point : \\\n", " maxproc(&omp_out, &omp_in)) \\\n", " initializer( omp_priv = { 0, 0 } )\n", "\n", "void find_enclosing_rectangle ( int n, struct point points[] )\n", "{\n", " struct point minp = { INT_MAX, INT_MAX }, maxp = {0,0};\n", " int i;\n", "\n", "#pragma omp parallel for reduction(min:minp) reduction(max:maxp)\n", " for ( i = 0; i < n; i++ ) {\n", " minproc(&minp, &points[i]);\n", " maxproc(&maxp, &points[i]);\n", " }\n", " printf(\"min = (%d, %d)\\n\", minp.x, minp.y);\n", " printf(\"max = (%d, %d)\\n\", maxp.x, maxp.y);\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following example shows the corresponding code in Fortran. The __declare__ __reduction__ directives are specified as part of the declaration in subroutine _find_enclosing_rectangle_ and the procedures that perform the min and max operations are specified as subprograms." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: udr.1\n", "! type: F-free\n", "! version: omp_4.0\n", "module data_type\n", "\n", " type :: point\n", " integer :: x\n", " integer :: y\n", " end type\n", "\n", "end module data_type\n", "\n", "subroutine find_enclosing_rectangle ( n, points )\n", " use data_type\n", " implicit none\n", " integer :: n\n", " type(point) :: points(*)\n", "\n", " !$omp declare reduction(min : point : minproc(omp_out, omp_in)) &\n", " !$omp& initializer( omp_priv = point( HUGE(0), HUGE(0) ) )\n", "\n", " !$omp declare reduction(max : point : maxproc(omp_out, omp_in)) &\n", " !$omp& initializer( omp_priv = point( 0, 0 ) )\n", "\n", " type(point) :: minp = point( HUGE(0), HUGE(0) ), maxp = point( 0, 0 )\n", " integer :: i\n", "\n", " !$omp parallel do reduction(min:minp) reduction(max:maxp)\n", " do i = 1, n\n", " call minproc(minp, points(i))\n", " call maxproc(maxp, points(i))\n", " end do\n", " print *, \"min = (\", minp%x, minp%y, \")\"\n", " print *, \"max = (\", maxp%x, maxp%y, \")\"\n", "\n", " contains\n", " subroutine minproc ( out, in )\n", " implicit none\n", " type(point), intent(inout) :: out\n", " type(point), intent(in) :: in\n", "\n", " out%x = min( out%x, in%x )\n", " out%y = min( out%y, in%y )\n", " end subroutine minproc\n", "\n", " subroutine maxproc ( out, in )\n", " implicit none\n", " type(point), intent(inout) :: out\n", " type(point), intent(in) :: in\n", "\n", " out%x = max( out%x, in%x )\n", " out%y = max( out%y, in%y )\n", " end subroutine maxproc\n", "\n", "end subroutine" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following example shows the same computation as _udr.1_ but it illustrates that you can craft complex expressions in the user-defined reduction declaration. In this case, instead of calling the _minproc_ and _maxproc_ functions we inline the code in a single expression." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: udr.2\n", "* type: C\n", "* version: omp_4.0\n", "*/\n", "#include \n", "#include \n", "\n", "struct point {\n", " int x;\n", " int y;\n", "};\n", "\n", "#pragma omp declare reduction(min : struct point : \\\n", " omp_out.x = omp_in.x > omp_out.x ? omp_out.x : omp_in.x, \\\n", " omp_out.y = omp_in.y > omp_out.y ? omp_out.y : omp_in.y ) \\\n", " initializer( omp_priv = { INT_MAX, INT_MAX } )\n", "\n", "#pragma omp declare reduction(max : struct point : \\\n", " omp_out.x = omp_in.x < omp_out.x ? omp_out.x : omp_in.x, \\\n", " omp_out.y = omp_in.y < omp_out.y ? omp_out.y : omp_in.y ) \\\n", " initializer( omp_priv = { 0, 0 } )\n", "\n", "void find_enclosing_rectangle ( int n, struct point points[] )\n", "{\n", " struct point minp = { INT_MAX, INT_MAX }, maxp = {0,0};\n", " int i;\n", "\n", "#pragma omp parallel for reduction(min:minp) reduction(max:maxp)\n", " for ( i = 0; i < n; i++ ) {\n", " if ( points[i].x < minp.x ) minp.x = points[i].x;\n", " if ( points[i].y < minp.y ) minp.y = points[i].y;\n", " if ( points[i].x > maxp.x ) maxp.x = points[i].x;\n", " if ( points[i].y > maxp.y ) maxp.y = points[i].y;\n", " }\n", " printf(\"min = (%d, %d)\\n\", minp.x, minp.y);\n", " printf(\"max = (%d, %d)\\n\", maxp.x, maxp.y);\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The corresponding code of the same example in Fortran is very similar except that the assignment expression in the __declare__ __reduction__ directive can only be used for a single variable, in this case through a type structure constructor _point( ... )_ ." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: udr.2\n", "! type: F-free\n", "! version: omp_4.0\n", "module data_type\n", "\n", " type :: point\n", " integer :: x\n", " integer :: y\n", " end type\n", "\n", "end module data_type\n", "\n", "subroutine find_enclosing_rectangle ( n, points )\n", " use data_type\n", " implicit none\n", " integer :: n\n", " type(point) :: points(*)\n", "\n", " !$omp declare reduction( min : point : &\n", " !$omp& omp_out = point(min( omp_out%x, omp_in%x ), &\n", " !$omp& min( omp_out%y, omp_in%y )) ) &\n", " !$omp& initializer( omp_priv = point( HUGE(0), HUGE(0) ) )\n", "\n", " !$omp declare reduction( max : point : &\n", " !$omp& omp_out = point(max( omp_out%x, omp_in%x ), &\n", " !$omp& max( omp_out%y, omp_in%y )) ) &\n", " !$omp& initializer( omp_priv = point( 0, 0 ) )\n", "\n", " type(point) :: minp = point( HUGE(0), HUGE(0) ), maxp = point( 0, 0 )\n", " integer :: i\n", "\n", " !$omp parallel do reduction(min: minp) reduction(max: maxp)\n", " do i = 1, n\n", " minp%x = min(minp%x, points(i)%x)\n", " minp%y = min(minp%y, points(i)%y)\n", " maxp%x = max(maxp%x, points(i)%x)\n", " maxp%y = max(maxp%y, points(i)%y)\n", " end do\n", " print *, \"min = (\", minp%x, minp%y, \")\"\n", " print *, \"max = (\", maxp%x, maxp%y, \")\"\n", "\n", "end subroutine" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following example shows the use of special variables in arguments for combiner (__omp_in__ and __omp_out__) and initializer (__omp_priv__ and __omp_orig__) routines. This example returns the maximum value of an array and the corresponding index value. The __declare__ __reduction__ directive specifies a user-defined reduction operation _maxloc_ for data type _struct_ _mx_s_ . The function _mx_combine_ is the combiner and the function _mx_init_ is the initializer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: udr.3\n", "* type: C\n", "* version: omp_4.0\n", "*/\n", "\n", "#include \n", "#define N 100\n", "\n", "struct mx_s {\n", " float value;\n", " int index;\n", "};\n", "\n", "/* prototype functions for combiner and initializer in\n", " the declare reduction */\n", "void mx_combine(struct mx_s *out, struct mx_s *in);\n", "void mx_init(struct mx_s *priv, struct mx_s *orig);\n", "\n", "#pragma omp declare reduction(maxloc: struct mx_s: \\\n", " mx_combine(&omp_out, &omp_in)) \\\n", " initializer(mx_init(&omp_priv, &omp_orig))\n", "\n", "void mx_combine(struct mx_s *out, struct mx_s *in)\n", "{\n", " if ( out->value < in->value ) {\n", " out->value = in->value;\n", " out->index = in->index;\n", " }\n", "}\n", "\n", "void mx_init(struct mx_s *priv, struct mx_s *orig)\n", "{\n", " priv->value = orig->value;\n", " priv->index = orig->index;\n", "}\n", "\n", "int main(void)\n", "{\n", " struct mx_s mx;\n", " float val[N], d;\n", " int i, count = N;\n", "\n", " for (i = 0; i < count; i++) {\n", " d = (N*0.8f - i);\n", " val[i] = N * N - d * d;\n", " }\n", "\n", " mx.value = val[0];\n", " mx.index = 0;\n", " #pragma omp parallel for reduction(maxloc: mx)\n", " for (i = 1; i < count; i++) {\n", " if (mx.value < val[i])\n", " {\n", " mx.value = val[i];\n", " mx.index = i;\n", " }\n", " }\n", "\n", " printf(\"max value = %g, index = %d\\n\", mx.value, mx.index);\n", " /* prints 10000, 80 */\n", "\n", " return 0;\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is the corresponding Fortran version of the above example. The __declare__ __reduction__ directive specifies the user-defined operation _maxloc_ for user-derived type _mx_s_ . The combiner _mx_combine_ and the initializer _mx_init_ are specified as subprograms." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: udr.3\n", "! type: F-free\n", "! version: omp_4.0\n", "program max_loc\n", " implicit none\n", "\n", " type :: mx_s\n", " real value\n", " integer index\n", " end type\n", "\n", " !$omp declare reduction(maxloc: mx_s: &\n", " !$omp& mx_combine(omp_out, omp_in)) &\n", " !$omp& initializer(mx_init(omp_priv, omp_orig))\n", "\n", " integer, parameter :: N = 100\n", " type(mx_s) :: mx\n", " real :: val(N), d\n", " integer :: i, count\n", "\n", " count = N\n", " do i = 1, count\n", " d = N*0.8 - i + 1\n", " val(i) = N * N - d * d\n", " enddo\n", "\n", " mx%value = val(1)\n", " mx%index = 1\n", " !$omp parallel do reduction(maxloc: mx)\n", " do i = 2, count\n", " if (mx%value < val(i)) then\n", " mx%value = val(i)\n", " mx%index = i\n", " endif\n", " enddo\n", "\n", " print *, 'max value = ', mx%value, ' index = ', mx%index\n", " ! prints 10000, 81\n", "\n", " contains\n", "\n", " subroutine mx_combine(out, in)\n", " implicit none\n", " type(mx_s), intent(inout) :: out\n", " type(mx_s), intent(in) :: in\n", "\n", " if ( out%value < in%value ) then\n", " out%value = in%value\n", " out%index = in%index\n", " endif\n", " end subroutine mx_combine\n", "\n", " subroutine mx_init(priv, orig)\n", " implicit none\n", " type(mx_s), intent(out) :: priv\n", " type(mx_s), intent(in) :: orig\n", "\n", " priv%value = orig%value\n", " priv%index = orig%index\n", " end subroutine mx_init\n", "\n", "end program" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following example explains a few details of the user-defined reduction in Fortran through modules. The __declare__ __reduction__ directive is declared in a module ( _data_red_ ). The reduction-identifier _.add._ is a user-defined operator that is to allow accessibility in the scope that performs the reduction operation. The user-defined operator _.add._ and the subroutine _dt_init_ specified in the __initializer__ clause are defined in the same subprogram." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The reduction operation (that is, the __reduction__ clause) is in the main program. The reduction identifier _.add._ is accessible by use association. Since _.add._ is a user-defined operator, the explicit interface should also be accessible by use association in the current program unit. Since the __declare__ __reduction__ associated to this __reduction__ clause has the __initializer__ clause, the subroutine specified on the clause must be accessible in the current scoping unit. In this case, the subroutine _dt_init_ is accessible by use association." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!!%compiler: gfortran\n", "!!%cflags: -fopenmp\n", "\n", "! name: udr.4\n", "! type: F-free\n", "! version: omp_4.0\n", "module data_red\n", "! Declare data type.\n", " type dt\n", " real :: r1\n", " real :: r2\n", " end type\n", "\n", "! Declare the user-defined operator .add.\n", " interface operator(.add.)\n", " module procedure addc\n", " end interface\n", "\n", "! Declare the user-defined reduction operator .add.\n", "!$omp declare reduction(.add.:dt:omp_out=omp_out.add.omp_in) &\n", "!$omp& initializer(dt_init(omp_priv))\n", "\n", " contains\n", "! Declare the initialization routine.\n", " subroutine dt_init(u)\n", " type(dt) :: u\n", " u%r1 = 0.0\n", " u%r2 = 0.0\n", " end subroutine\n", "\n", "! Declare the specific procedure for the .add. operator.\n", " function addc(x1, x2) result(xresult)\n", " type(dt), intent(in) :: x1, x2\n", " type(dt) :: xresult\n", " xresult%r1 = x1%r1 + x2%r2\n", " xresult%r2 = x1%r2 + x2%r1\n", " end function\n", "\n", "end module data_red\n", "\n", "program main\n", " use data_red, only : dt, dt_init, operator(.add.)\n", "\n", " type(dt) :: xdt1, xdt2\n", " integer :: i\n", "\n", " xdt1 = dt(1.0,2.0)\n", " xdt2 = dt(2.0,3.0)\n", "\n", "! The reduction operation\n", "!$omp parallel do reduction(.add.: xdt1)\n", " do i = 1, 10\n", " xdt1 = xdt1 .add. xdt2\n", " end do\n", "!$omp end parallel do\n", "\n", " print *, xdt1\n", "\n", "end program" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following example uses user-defined reductions to declare a plus (+) reduction for a C++ class. As the __declare__ __reduction__ directive is inside the context of the _V_ class the expressions in the __declare__ __reduction__ directive are resolved in the context of the class. Also, note that the __initializer__ clause uses a copy constructor to initialize the private variables of the reduction and it uses as parameter to its original variable by using the special variable __omp_orig__." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: udr.5\n", "* type: C++\n", "* version: omp_4.0\n", "*/\n", "class V {\n", " float *p;\n", " int n;\n", "\n", "public:\n", " V( int _n ) : n(_n) { p = new float[n]; }\n", " V( const V& m ) : n(m.n) { p = new float[n]; }\n", " ~V() { delete[] p; }\n", "\n", " V& operator+= ( const V& );\n", "\n", " #pragma omp declare reduction( + : V : omp_out += omp_in ) \\\n", " initializer(omp_priv(omp_orig))\n", "};" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following examples shows how user-defined reductions can be defined for some STL containers. The first __declare__ __reduction__ defines the plus (+) operation for _std::vector_ by making use of the _std::transform_ algorithm. The second and third define the merge (or concatenation) operation for _std::vector_ and _std::list_ . It shows how the user-defined reduction operation can be applied to specific data types of an STL." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "//%compiler: clang\n", "//%cflags: -fopenmp\n", "\n", "/*\n", "* name: udr.6\n", "* type: C++\n", "* version: omp_4.0\n", "*/\n", "#include \n", "#include \n", "#include \n", "\n", "#pragma omp declare reduction( + : std::vector : \\\n", " std::transform (omp_out.begin(), omp_out.end(), \\\n", " omp_in.begin(), omp_in.end(),std::plus()))\n", "\n", "#pragma omp declare reduction( merge : std::vector : \\\n", " omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))\n", "\n", "#pragma omp declare reduction( merge : std::list : \\\n", " omp_out.merge(omp_in))" ] } ], "metadata": { "kernelspec": { "display_name": "Native", "language": "native", "name": "native" }, "language_info": { "file_extension": ".c", "mimetype": "text/plain", "name": "c" } }, "nbformat": 4, "nbformat_minor": 4 }