{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# DaCe with Explicit Dataflow in Python\n",
    "\n",
    "In this tutorial, we will use the explicit dataflow specification in Python to construct DaCe programs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script src=\"https://spcl.github.io/dace-webclient/dist/sdfv.js\"></script>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import dace"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Explicit dataflow is a Python-based syntax that is close to defining SDFGs. In explicit ` @dace.program `s, the code (Tasklets) and memory movement (Memlets) are specified separately, as we show below.\n",
    "\n",
    "## Matrix Transposition\n",
    "\n",
    "We begin with a simple example, transposing a matrix (out-of-place). \n",
    "\n",
    "First, since we do not know what the matrix sizes will be, we define symbolic sizes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "M = dace.symbol('M')\n",
    "N = dace.symbol('N')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now proceed to define the data-centric part of the application (i.e., the part that can be optimized by DaCe). It is a simple function which, when called, invokes the compilation and optimization procedure. It can also be compiled explicitly, as we show in the next example.\n",
    "\n",
    "DaCe programs use explicit types, so that they can be compiled. We provide a numpy-compatible set of types that can define N-dimensional tensors. For example, `dace.int64` defines a 64-bit signed integer scalar, and `dace.float32[133,8]` defines a 133-row and 8-column 2D array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "@dace.program\n",
    "def transpose(A: dace.float32[M, N], B: dace.float32[N, M]):\n",
    "    # Inside the function we will define a tasklet in a map, which is shortened\n",
    "    # to dace.map. We define the map range in the arguments:\n",
    "    @dace.map\n",
    "    def mytasklet(i: _[0:M], j: _[0:N]):\n",
    "        # Pre-declaring the memlets is required in explicit dataflow, tasklets\n",
    "        # cannot use any external memory apart from data flowing to/from it.\n",
    "        a << A[i,j]  # Input memlet (<<)\n",
    "        b >> B[j,i]  # Output memlet (>>)\n",
    "        \n",
    "        # The code\n",
    "        b = a"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And that's it! We will now define some regression test using numpy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "A = np.random.rand(37, 11).astype(np.float32)\n",
    "expected = A.transpose()\n",
    "# Define an array for the output of the dace program\n",
    "B = np.random.rand(11, 37).astype(np.float32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we call `transpose`, we can inspect the SDFG:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<div class=\"sdfv\">\n",
       "<div id=\"contents_5749918072939441446\" style=\"position: relative; resize: vertical; overflow: auto\"></div>\n",
       "</div>\n",
       "<script>\n",
       "    var sdfg_5749918072939441446 = \"{\\n  \\\"type\\\": \\\"SDFG\\\",\\n  \\\"attributes\\\": {\\n    \\\"name\\\": \\\"transpose\\\",\\n    \\\"arg_names\\\": [\\n      \\\"A\\\",\\n      \\\"B\\\"\\n    ],\\n    \\\"constants_prop\\\": {},\\n    \\\"_arrays\\\": {\\n      \\\"A\\\": {\\n        \\\"type\\\": \\\"Array\\\",\\n        \\\"attributes\\\": {\\n          \\\"allow_conflicts\\\": false,\\n          \\\"strides\\\": [\\n            \\\"N\\\",\\n            \\\"1\\\"\\n          ],\\n          \\\"total_size\\\": \\\"M*N\\\",\\n          \\\"offset\\\": [\\n            \\\"0\\\",\\n            \\\"0\\\"\\n          ],\\n          \\\"may_alias\\\": false,\\n          \\\"alignment\\\": 0,\\n          \\\"start_offset\\\": 0,\\n          \\\"optional\\\": false,\\n          \\\"pool\\\": false,\\n          \\\"dtype\\\": \\\"float32\\\",\\n          \\\"shape\\\": [\\n            \\\"M\\\",\\n            \\\"N\\\"\\n          ],\\n          \\\"transient\\\": false,\\n          \\\"storage\\\": \\\"Default\\\",\\n          \\\"lifetime\\\": \\\"Scope\\\",\\n          \\\"location\\\": {},\\n          \\\"debuginfo\\\": null\\n        }\\n      },\\n      \\\"B\\\": {\\n        \\\"type\\\": \\\"Array\\\",\\n        \\\"attributes\\\": {\\n          \\\"allow_conflicts\\\": false,\\n          \\\"strides\\\": [\\n            \\\"M\\\",\\n            \\\"1\\\"\\n          ],\\n          \\\"total_size\\\": \\\"M*N\\\",\\n          \\\"offset\\\": [\\n            \\\"0\\\",\\n            \\\"0\\\"\\n          ],\\n          \\\"may_alias\\\": false,\\n          \\\"alignment\\\": 0,\\n          \\\"start_offset\\\": 0,\\n          \\\"optional\\\": false,\\n          \\\"pool\\\": false,\\n          \\\"dtype\\\": \\\"float32\\\",\\n          \\\"shape\\\": [\\n            \\\"N\\\",\\n            \\\"M\\\"\\n          ],\\n          \\\"transient\\\": false,\\n          \\\"storage\\\": \\\"Default\\\",\\n          \\\"lifetime\\\": \\\"Scope\\\",\\n          \\\"location\\\": {},\\n          \\\"debuginfo\\\": null\\n        }\\n      }\\n    },\\n    \\\"symbols\\\": {\\n      \\\"M\\\": \\\"int32\\\",\\n      \\\"N\\\": \\\"int32\\\"\\n    },\\n    \\\"instrument\\\": \\\"No_Instrumentation\\\",\\n    \\\"global_code\\\": {\\n      \\\"frame\\\": {\\n        \\\"string_data\\\": \\\"\\\",\\n        \\\"language\\\": \\\"CPP\\\"\\n      }\\n    },\\n    \\\"init_code\\\": {\\n      \\\"frame\\\": {\\n        \\\"string_data\\\": \\\"\\\",\\n        \\\"language\\\": \\\"CPP\\\"\\n      }\\n    },\\n    \\\"exit_code\\\": {\\n      \\\"frame\\\": {\\n        \\\"string_data\\\": \\\"\\\",\\n        \\\"language\\\": \\\"CPP\\\"\\n      }\\n    },\\n    \\\"orig_sdfg\\\": null,\\n    \\\"transformation_hist\\\": [],\\n    \\\"logical_groups\\\": [],\\n    \\\"openmp_sections\\\": false,\\n    \\\"debuginfo\\\": {\\n      \\\"type\\\": \\\"DebugInfo\\\",\\n      \\\"start_line\\\": 1,\\n      \\\"end_line\\\": 13,\\n      \\\"start_column\\\": 0,\\n      \\\"end_column\\\": 0,\\n      \\\"filename\\\": \\\"/tmp/ipykernel_29737/1388926337.py\\\"\\n    },\\n    \\\"_pgrids\\\": {},\\n    \\\"_subarrays\\\": {},\\n    \\\"_rdistrarrays\\\": {},\\n    \\\"callback_mapping\\\": {},\\n    \\\"is_collapsed\\\": false\\n  },\\n  \\\"nodes\\\": [\\n    {\\n      \\\"type\\\": \\\"SDFGState\\\",\\n      \\\"label\\\": \\\"s6_4\\\",\\n      \\\"id\\\": 0,\\n      \\\"collapsed\\\": false,\\n      \\\"scope_dict\\\": {\\n        \\\"-1\\\": [\\n          0,\\n          2,\\n          3\\n        ],\\n        \\\"0\\\": [\\n          1,\\n          4\\n        ]\\n      },\\n      \\\"nodes\\\": [\\n        {\\n          \\\"type\\\": \\\"MapEntry\\\",\\n          \\\"label\\\": \\\"mytasklet[i=0:M, j=0:N]\\\",\\n          \\\"attributes\\\": {\\n            \\\"label\\\": \\\"mytasklet\\\",\\n            \\\"params\\\": [\\n              \\\"i\\\",\\n              \\\"j\\\"\\n            ],\\n            \\\"range\\\": {\\n              \\\"type\\\": \\\"Range\\\",\\n              \\\"ranges\\\": [\\n                {\\n                  \\\"start\\\": \\\"0\\\",\\n                  \\\"end\\\": \\\"M - 1\\\",\\n                  \\\"step\\\": \\\"1\\\",\\n                  \\\"tile\\\": \\\"1\\\"\\n                },\\n                {\\n                  \\\"start\\\": \\\"0\\\",\\n                  \\\"end\\\": \\\"N - 1\\\",\\n                  \\\"step\\\": \\\"1\\\",\\n                  \\\"tile\\\": \\\"1\\\"\\n                }\\n              ]\\n            },\\n            \\\"schedule\\\": \\\"Default\\\",\\n            \\\"unroll\\\": false,\\n            \\\"collapse\\\": 1,\\n            \\\"debuginfo\\\": {\\n              \\\"type\\\": \\\"DebugInfo\\\",\\n              \\\"start_line\\\": 6,\\n              \\\"end_line\\\": 6,\\n              \\\"start_column\\\": 4,\\n              \\\"end_column\\\": 4,\\n              \\\"filename\\\": \\\"/tmp/ipykernel_29737/1388926337.py\\\"\\n            },\\n            \\\"is_collapsed\\\": false,\\n            \\\"instrument\\\": \\\"No_Instrumentation\\\",\\n            \\\"in_connectors\\\": {\\n              \\\"IN_A\\\": null\\n            },\\n            \\\"out_connectors\\\": {\\n              \\\"OUT_A\\\": null\\n            }\\n          },\\n          \\\"id\\\": 0,\\n          \\\"scope_entry\\\": null,\\n          \\\"scope_exit\\\": \\\"1\\\"\\n        },\\n        {\\n          \\\"type\\\": \\\"MapExit\\\",\\n          \\\"label\\\": \\\"mytasklet[i=0:M, j=0:N]\\\",\\n          \\\"attributes\\\": {\\n            \\\"in_connectors\\\": {\\n              \\\"IN_B\\\": null\\n            },\\n            \\\"out_connectors\\\": {\\n              \\\"OUT_B\\\": null\\n            }\\n          },\\n          \\\"id\\\": 1,\\n          \\\"scope_entry\\\": \\\"0\\\",\\n          \\\"scope_exit\\\": \\\"1\\\"\\n        },\\n        {\\n          \\\"type\\\": \\\"AccessNode\\\",\\n          \\\"label\\\": \\\"A\\\",\\n          \\\"attributes\\\": {\\n            \\\"setzero\\\": false,\\n            \\\"debuginfo\\\": {\\n              \\\"type\\\": \\\"DebugInfo\\\",\\n              \\\"start_line\\\": 6,\\n              \\\"end_line\\\": 6,\\n              \\\"start_column\\\": 4,\\n              \\\"end_column\\\": 4,\\n              \\\"filename\\\": \\\"/tmp/ipykernel_29737/1388926337.py\\\"\\n            },\\n            \\\"data\\\": \\\"A\\\",\\n            \\\"instrument\\\": \\\"No_Instrumentation\\\",\\n            \\\"instrument_condition\\\": {\\n              \\\"string_data\\\": \\\"1\\\",\\n              \\\"language\\\": \\\"CPP\\\"\\n            },\\n            \\\"in_connectors\\\": {},\\n            \\\"out_connectors\\\": {}\\n          },\\n          \\\"id\\\": 2,\\n          \\\"scope_entry\\\": null,\\n          \\\"scope_exit\\\": null\\n        },\\n        {\\n          \\\"type\\\": \\\"AccessNode\\\",\\n          \\\"label\\\": \\\"B\\\",\\n          \\\"attributes\\\": {\\n            \\\"setzero\\\": false,\\n            \\\"debuginfo\\\": {\\n              \\\"type\\\": \\\"DebugInfo\\\",\\n              \\\"start_line\\\": 6,\\n              \\\"end_line\\\": 6,\\n              \\\"start_column\\\": 4,\\n              \\\"end_column\\\": 4,\\n              \\\"filename\\\": \\\"/tmp/ipykernel_29737/1388926337.py\\\"\\n            },\\n            \\\"data\\\": \\\"B\\\",\\n            \\\"instrument\\\": \\\"No_Instrumentation\\\",\\n            \\\"instrument_condition\\\": {\\n              \\\"string_data\\\": \\\"1\\\",\\n              \\\"language\\\": \\\"CPP\\\"\\n            },\\n            \\\"in_connectors\\\": {},\\n            \\\"out_connectors\\\": {}\\n          },\\n          \\\"id\\\": 3,\\n          \\\"scope_entry\\\": null,\\n          \\\"scope_exit\\\": null\\n        },\\n        {\\n          \\\"type\\\": \\\"Tasklet\\\",\\n          \\\"label\\\": \\\"mytasklet\\\",\\n          \\\"attributes\\\": {\\n            \\\"code\\\": {\\n              \\\"string_data\\\": \\\"b = a\\\",\\n              \\\"language\\\": \\\"Python\\\"\\n            },\\n            \\\"state_fields\\\": [],\\n            \\\"code_global\\\": {\\n              \\\"string_data\\\": \\\"\\\",\\n              \\\"language\\\": \\\"CPP\\\"\\n            },\\n            \\\"code_init\\\": {\\n              \\\"string_data\\\": \\\"\\\",\\n              \\\"language\\\": \\\"CPP\\\"\\n            },\\n            \\\"code_exit\\\": {\\n              \\\"string_data\\\": \\\"\\\",\\n              \\\"language\\\": \\\"CPP\\\"\\n            },\\n            \\\"debuginfo\\\": {\\n              \\\"type\\\": \\\"DebugInfo\\\",\\n              \\\"start_line\\\": 6,\\n              \\\"end_line\\\": 13,\\n              \\\"start_column\\\": 4,\\n              \\\"end_column\\\": 8,\\n              \\\"filename\\\": \\\"/tmp/ipykernel_29737/1388926337.py\\\"\\n            },\\n            \\\"instrument\\\": \\\"No_Instrumentation\\\",\\n            \\\"side_effects\\\": null,\\n            \\\"ignored_symbols\\\": [],\\n            \\\"label\\\": \\\"mytasklet\\\",\\n            \\\"location\\\": {},\\n            \\\"environments\\\": [],\\n            \\\"in_connectors\\\": {\\n              \\\"a\\\": null\\n            },\\n            \\\"out_connectors\\\": {\\n              \\\"b\\\": null\\n            }\\n          },\\n          \\\"id\\\": 4,\\n          \\\"scope_entry\\\": \\\"0\\\",\\n          \\\"scope_exit\\\": \\\"1\\\"\\n        }\\n      ],\\n      \\\"edges\\\": [\\n        {\\n          \\\"type\\\": \\\"MultiConnectorEdge\\\",\\n          \\\"attributes\\\": {\\n            \\\"data\\\": {\\n              \\\"type\\\": \\\"Memlet\\\",\\n              \\\"attributes\\\": {\\n                \\\"volume\\\": \\\"M*N\\\",\\n                \\\"dynamic\\\": false,\\n                \\\"subset\\\": {\\n                  \\\"type\\\": \\\"Range\\\",\\n                  \\\"ranges\\\": [\\n                    {\\n                      \\\"start\\\": \\\"0\\\",\\n                      \\\"end\\\": \\\"M - 1\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    },\\n                    {\\n                      \\\"start\\\": \\\"0\\\",\\n                      \\\"end\\\": \\\"N - 1\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    }\\n                  ]\\n                },\\n                \\\"other_subset\\\": null,\\n                \\\"data\\\": \\\"A\\\",\\n                \\\"wcr\\\": null,\\n                \\\"debuginfo\\\": null,\\n                \\\"wcr_nonatomic\\\": false,\\n                \\\"allow_oob\\\": false,\\n                \\\"src_subset\\\": {\\n                  \\\"type\\\": \\\"Range\\\",\\n                  \\\"ranges\\\": [\\n                    {\\n                      \\\"start\\\": \\\"0\\\",\\n                      \\\"end\\\": \\\"M - 1\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    },\\n                    {\\n                      \\\"start\\\": \\\"0\\\",\\n                      \\\"end\\\": \\\"N - 1\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    }\\n                  ]\\n                },\\n                \\\"dst_subset\\\": null,\\n                \\\"is_data_src\\\": true,\\n                \\\"num_accesses\\\": \\\"M*N\\\"\\n              }\\n            }\\n          },\\n          \\\"src\\\": \\\"2\\\",\\n          \\\"dst\\\": \\\"0\\\",\\n          \\\"dst_connector\\\": \\\"IN_A\\\",\\n          \\\"src_connector\\\": null\\n        },\\n        {\\n          \\\"type\\\": \\\"MultiConnectorEdge\\\",\\n          \\\"attributes\\\": {\\n            \\\"data\\\": {\\n              \\\"type\\\": \\\"Memlet\\\",\\n              \\\"attributes\\\": {\\n                \\\"volume\\\": \\\"1\\\",\\n                \\\"dynamic\\\": false,\\n                \\\"subset\\\": {\\n                  \\\"type\\\": \\\"Range\\\",\\n                  \\\"ranges\\\": [\\n                    {\\n                      \\\"start\\\": \\\"i\\\",\\n                      \\\"end\\\": \\\"i\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    },\\n                    {\\n                      \\\"start\\\": \\\"j\\\",\\n                      \\\"end\\\": \\\"j\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    }\\n                  ]\\n                },\\n                \\\"other_subset\\\": null,\\n                \\\"data\\\": \\\"A\\\",\\n                \\\"wcr\\\": null,\\n                \\\"debuginfo\\\": null,\\n                \\\"wcr_nonatomic\\\": false,\\n                \\\"allow_oob\\\": false,\\n                \\\"src_subset\\\": {\\n                  \\\"type\\\": \\\"Range\\\",\\n                  \\\"ranges\\\": [\\n                    {\\n                      \\\"start\\\": \\\"i\\\",\\n                      \\\"end\\\": \\\"i\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    },\\n                    {\\n                      \\\"start\\\": \\\"j\\\",\\n                      \\\"end\\\": \\\"j\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    }\\n                  ]\\n                },\\n                \\\"dst_subset\\\": null,\\n                \\\"is_data_src\\\": true,\\n                \\\"num_accesses\\\": \\\"1\\\"\\n              }\\n            }\\n          },\\n          \\\"src\\\": \\\"0\\\",\\n          \\\"dst\\\": \\\"4\\\",\\n          \\\"dst_connector\\\": \\\"a\\\",\\n          \\\"src_connector\\\": \\\"OUT_A\\\"\\n        },\\n        {\\n          \\\"type\\\": \\\"MultiConnectorEdge\\\",\\n          \\\"attributes\\\": {\\n            \\\"data\\\": {\\n              \\\"type\\\": \\\"Memlet\\\",\\n              \\\"attributes\\\": {\\n                \\\"volume\\\": \\\"M*N\\\",\\n                \\\"dynamic\\\": false,\\n                \\\"subset\\\": {\\n                  \\\"type\\\": \\\"Range\\\",\\n                  \\\"ranges\\\": [\\n                    {\\n                      \\\"start\\\": \\\"0\\\",\\n                      \\\"end\\\": \\\"N - 1\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    },\\n                    {\\n                      \\\"start\\\": \\\"0\\\",\\n                      \\\"end\\\": \\\"M - 1\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    }\\n                  ]\\n                },\\n                \\\"other_subset\\\": null,\\n                \\\"data\\\": \\\"B\\\",\\n                \\\"wcr\\\": null,\\n                \\\"debuginfo\\\": null,\\n                \\\"wcr_nonatomic\\\": false,\\n                \\\"allow_oob\\\": false,\\n                \\\"src_subset\\\": null,\\n                \\\"dst_subset\\\": {\\n                  \\\"type\\\": \\\"Range\\\",\\n                  \\\"ranges\\\": [\\n                    {\\n                      \\\"start\\\": \\\"0\\\",\\n                      \\\"end\\\": \\\"N - 1\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    },\\n                    {\\n                      \\\"start\\\": \\\"0\\\",\\n                      \\\"end\\\": \\\"M - 1\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    }\\n                  ]\\n                },\\n                \\\"is_data_src\\\": false,\\n                \\\"num_accesses\\\": \\\"M*N\\\"\\n              }\\n            }\\n          },\\n          \\\"src\\\": \\\"1\\\",\\n          \\\"dst\\\": \\\"3\\\",\\n          \\\"dst_connector\\\": null,\\n          \\\"src_connector\\\": \\\"OUT_B\\\"\\n        },\\n        {\\n          \\\"type\\\": \\\"MultiConnectorEdge\\\",\\n          \\\"attributes\\\": {\\n            \\\"data\\\": {\\n              \\\"type\\\": \\\"Memlet\\\",\\n              \\\"attributes\\\": {\\n                \\\"volume\\\": \\\"1\\\",\\n                \\\"dynamic\\\": false,\\n                \\\"subset\\\": {\\n                  \\\"type\\\": \\\"Range\\\",\\n                  \\\"ranges\\\": [\\n                    {\\n                      \\\"start\\\": \\\"j\\\",\\n                      \\\"end\\\": \\\"j\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    },\\n                    {\\n                      \\\"start\\\": \\\"i\\\",\\n                      \\\"end\\\": \\\"i\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    }\\n                  ]\\n                },\\n                \\\"other_subset\\\": null,\\n                \\\"data\\\": \\\"B\\\",\\n                \\\"wcr\\\": null,\\n                \\\"debuginfo\\\": null,\\n                \\\"wcr_nonatomic\\\": false,\\n                \\\"allow_oob\\\": false,\\n                \\\"src_subset\\\": null,\\n                \\\"dst_subset\\\": {\\n                  \\\"type\\\": \\\"Range\\\",\\n                  \\\"ranges\\\": [\\n                    {\\n                      \\\"start\\\": \\\"j\\\",\\n                      \\\"end\\\": \\\"j\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    },\\n                    {\\n                      \\\"start\\\": \\\"i\\\",\\n                      \\\"end\\\": \\\"i\\\",\\n                      \\\"step\\\": \\\"1\\\",\\n                      \\\"tile\\\": \\\"1\\\"\\n                    }\\n                  ]\\n                },\\n                \\\"is_data_src\\\": false,\\n                \\\"num_accesses\\\": \\\"1\\\"\\n              }\\n            }\\n          },\\n          \\\"src\\\": \\\"4\\\",\\n          \\\"dst\\\": \\\"1\\\",\\n          \\\"dst_connector\\\": \\\"IN_B\\\",\\n          \\\"src_connector\\\": \\\"b\\\"\\n        }\\n      ],\\n      \\\"attributes\\\": {\\n        \\\"nosync\\\": false,\\n        \\\"instrument\\\": \\\"No_Instrumentation\\\",\\n        \\\"symbol_instrument\\\": \\\"No_Instrumentation\\\",\\n        \\\"symbol_instrument_condition\\\": {\\n          \\\"string_data\\\": \\\"1\\\",\\n          \\\"language\\\": \\\"CPP\\\"\\n        },\\n        \\\"executions\\\": \\\"1\\\",\\n        \\\"dynamic_executions\\\": false,\\n        \\\"ranges\\\": {},\\n        \\\"location\\\": {},\\n        \\\"is_collapsed\\\": false\\n      }\\n    }\\n  ],\\n  \\\"edges\\\": [],\\n  \\\"collapsed\\\": false,\\n  \\\"label\\\": \\\"\\\",\\n  \\\"id\\\": null,\\n  \\\"sdfg_list_id\\\": 0,\\n  \\\"start_state\\\": 0,\\n  \\\"dace_version\\\": \\\"0.15.1\\\"\\n}\";\n",
       "</script>\n",
       "<script>\n",
       "    var sdfv_5749918072939441446 = new SDFV();\n",
       "    var renderer_5749918072939441446 = new SDFGRenderer(sdfv_5749918072939441446, parse_sdfg(sdfg_5749918072939441446),\n",
       "        document.getElementById('contents_5749918072939441446'));\n",
       "</script>"
      ],
      "text/plain": [
       "SDFG (transpose)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sdfg = transpose.to_sdfg()\n",
    "sdfg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now call `transpose` directly, or using the SDFG we created. When calling `transpose`, we need to feed the symbols as well as the arguments (since the arrays are `numpy` rather than symbolic `dace` arrays, see below tutorials). When prompted for transformations, we will now just press the \"Enter\" key to skip them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING: Casting scalar argument \"M\" from int to <class 'numpy.int32'>\n",
      "WARNING: Casting scalar argument \"N\" from int to <class 'numpy.int32'>\n"
     ]
    }
   ],
   "source": [
    "sdfg(A=A, B=B, M=A.shape[0], N=A.shape[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Difference: 0.0\n"
     ]
    }
   ],
   "source": [
    "print('Difference:', np.linalg.norm(expected - B))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Query (using Streams)\n",
    "\n",
    "In this example, we will use the Stream construct and symbolic dace ND arrays to create a simple parallel filter. We first define a symbolic size and a symbolically-sized array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = dace.symbol('N')\n",
    "n = 255\n",
    "\n",
    "storage = dace.ndarray(shape=[n], dtype=dace.int32)\n",
    "# The size of \"output\" will actually be lesser or equal to N, but we need to \n",
    "# statically allocate the memory.\n",
    "output = dace.ndarray(shape=[n], dtype=dace.int32)\n",
    "# The size is a scalar\n",
    "output_size = dace.scalar(dtype=dace.uint32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As with `transpose`, the DaCe program also consists of a tasklet nested in a Map, but also includes a Stream (to which we push outputs as necessary) that is directly connected to the output array, as well as a conflict-resolution output (because all tasklets in the map write to the same address:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "@dace.program\n",
    "def query(data: dace.int32[N], output: dace.int32[N], outsz: dace.int32[1], \n",
    "          threshold: dace.int32):\n",
    "    # Define a local, unbounded (buffer_size=0) stream\n",
    "    S = dace.define_stream(dace.int32, 0)\n",
    "    \n",
    "    # Filtering tasklet\n",
    "    @dace.map\n",
    "    def filter(i: _[0:N]):\n",
    "        a << data[i]\n",
    "        # Writing to S (no location necessary) a dynamic number of times (-1)\n",
    "        out >> S(-1)\n",
    "        # Writing to outsz dynamically (-1), if there is a conflict, we will sum the results\n",
    "        osz >> outsz(-1, lambda a,b: a+b)   \n",
    "        \n",
    "        if a > threshold:\n",
    "            # Pushing to a stream or writing with a conflict use the assignment operator\n",
    "            out = a\n",
    "            osz = 1\n",
    "            \n",
    "    # Define a memlet from S to the output\n",
    "    S >> output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can compile `query` without defining anything further. However, before we call `query`, we will need to set the symbol sizes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "qfunc = query.compile()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "thres = 50"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define some random integers and zero outputs\n",
    "import numpy as np\n",
    "storage[:] = np.random.randint(0, 100, size=n)\n",
    "output_size[0] = 0\n",
    "output[:] = np.zeros(n).astype(np.int32)\n",
    "\n",
    "# Compute expected output using numpy\n",
    "expected = storage[np.where(storage > thres)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we will just use the Python function prototype to call the code, since we do not invoke it through the SDFG:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING: Casting scalar argument \"threshold\" from int to <class 'numpy.int32'>\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array([121], dtype=uint32)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "qfunc(data=storage, output=output, outsz=output_size, threshold=thres, N=n)\n",
    "output_size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Difference: 0.0\n"
     ]
    }
   ],
   "source": [
    "filtered_output = output[:output_size[0]]\n",
    "# Sorting outputs to avoid concurrency-based reordering\n",
    "print('Difference:', np.linalg.norm(np.sort(expected) - np.sort(filtered_output)))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}