{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "42d04fab-1afc-4547-ab01-df3e66c8ac56",
   "metadata": {},
   "source": [
    "# Covid-19 Trends Predictions\n",
    "\n",
    "**As a Data Scientist you will login with username/password provided by the data owner and perform Remote Data Science**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7cc0b4d2-a8cb-4832-a0c2-bc4680588498",
   "metadata": {},
   "source": [
    "## Import Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b889d920-5f67-4f3b-a909-0d3998575d50",
   "metadata": {},
   "outputs": [],
   "source": [
    "import syft as sy\n",
    "import numpy as np\n",
    "import matplotlib, matplotlib.pyplot as plt\n",
    "import os\n",
    "import pandas as pd\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8e50aeb-b9ab-449d-8d90-f655543698d5",
   "metadata": {},
   "source": [
    "## Login to Domain Node as Data Scientist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "7969e7e1-f06b-4176-8afd-de01d42f14ef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Connecting to None... done! \t Logging into local_node... done!\n"
     ]
    }
   ],
   "source": [
    "ds_node = sy.login(email=\"zoheb@amat.com\", password=\"bazinga\", port=8081)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67194b5d-0bf1-482b-9df5-1c266f7bb1da",
   "metadata": {},
   "source": [
    "**Lets check our initial privacy budget**\n",
    "\n",
    "The privacy budget represents how much noise the data scientist can remove from a dataset when accessing it. Domains will set a privacy budget per data scientist."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e28db34d-faeb-4468-89b0-f6f50635aca3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "700.0"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds_node.privacy_budget"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff609a03-69ff-4baa-a997-728f23cf5c31",
   "metadata": {},
   "source": [
    "## View the available datasets on the Node"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e6c615db-b0f4-44f9-a3c2-4701773042f9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       "                #myInput {\n",
       "                  background-position: 10px 12px; /* Position the search icon */\n",
       "                  background-repeat: no-repeat; /* Do not repeat the icon image */\n",
       "                  background-color: #bbb;\n",
       "                  width: 98%; /* Full-width */\n",
       "                  font-size: 14px; /* Increase font-size */\n",
       "                  padding: 12px 20px 12px 40px; /* Add some padding */\n",
       "                  border: 1px solid #ddd; /* Add a grey border */\n",
       "                  margin-bottom: 12px; /* Add some space below the input */\n",
       "                }\n",
       "\n",
       "                #myTable {\n",
       "                  border-collapse: collapse; /* Collapse borders */\n",
       "                  width: 100%; /* Full-width */\n",
       "                  border: 1px solid #ddd; /* Add a grey border */\n",
       "                  font-size: 14px; /* Increase font-size */\n",
       "                }\n",
       "\n",
       "                #myTable th, #myTable td {\n",
       "                  text-align: left; /* Left-align text */\n",
       "                  padding: 10px; /* Add padding */\n",
       "                }\n",
       "\n",
       "                #myTable tr {\n",
       "                  /* Add a bottom border to all table rows */\n",
       "                  border-bottom: 1px solid #ddd;\n",
       "                }\n",
       "\n",
       "                #myTable tr.header, #myTable tr:hover {\n",
       "                  /* Add a grey background color to the table header and on hover */\n",
       "                  background-color: #777;\n",
       "                }\n",
       "                </style>\n",
       "\n",
       "                <table id=\"myTable\" style=\"width:1000px\">\n",
       "                  <tr class=\"header\">\n",
       "                    <th style=\"width:30px\">Idx</th>\n",
       "                    <th style=\"width:20%;\">Name</th>\n",
       "                    <th style=\"width:35%;\">Description</th>\n",
       "                    <th style=\"width:20%;\">Assets</th>\n",
       "                    <th style=\"width:300px;\">Id</th>\n",
       "                  </tr>\n",
       "                \n",
       "\n",
       "          <tr>\n",
       "            <td>[0]</td>\n",
       "            <td>COVID19 Cases in 175 countries</td>\n",
       "            <td>Weekly data for an entire year</td>\n",
       "            <td>[\"Country 0\"] -> Tensor<br /><br />[\"Country 1\"] -> Tensor<br /><br />[\"Country 2\"] -> Tensor<br /><br />...<br /><br /></td>\n",
       "            <td>51da7d0f-7e80-4b82-b5aa-9814a3ee9cef</td>\n",
       "          </tr>\n",
       "        </table>\n",
       "\n",
       "        <script>\n",
       "        function myFunction() {\n",
       "          // Declare variables\n",
       "          var input, filter, table, tr, td, i, txtValue;\n",
       "          input = document.getElementById(\"myInput\");\n",
       "          filter = input.value.toUpperCase();\n",
       "          table = document.getElementById(\"myTable\");\n",
       "          tr = table.getElementsByTagName(\"tr\");\n",
       "\n",
       "          // Loop through all table rows, and hide those who don't match the search query\n",
       "          for (i = 0; i < tr.length; i++) {\n",
       "            name_td = tr[i].getElementsByTagName(\"td\")[1];\n",
       "            desc_td = tr[i].getElementsByTagName(\"td\")[2];\n",
       "            asset_td = tr[i].getElementsByTagName(\"td\")[3];\n",
       "            id_td = tr[i].getElementsByTagName(\"td\")[4];\n",
       "            if (name_td || desc_td || asset_td || id_td) {\n",
       "              name_txtValue = name_td.textContent || name_td.innerText;\n",
       "              desc_txtValue = desc_td.textContent || name_td.innerText;\n",
       "              asset_txtValue = asset_td.textContent || name_td.innerText;\n",
       "              id_txtValue = id_td.textContent || name_td.innerText;\n",
       "              name_bool = name_txtValue.toUpperCase().indexOf(filter) > -1;\n",
       "              desc_bool = desc_txtValue.toUpperCase().indexOf(filter) > -1;\n",
       "              asset_bool = asset_txtValue.toUpperCase().indexOf(filter) > -1;\n",
       "              id_bool = id_txtValue.toUpperCase().indexOf(filter) > -1;\n",
       "              if (name_bool || desc_bool || asset_bool || id_bool) {\n",
       "                tr[i].style.display = \"\";\n",
       "              } else {\n",
       "                tr[i].style.display = \"none\";\n",
       "              }\n",
       "            }\n",
       "          }\n",
       "        }\n",
       "        </script>"
      ],
      "text/plain": [
       "<syft.core.node.common.client_manager.dataset_api.DatasetRequestAPI at 0x1b55e5520>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds_node.datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40159b99-9f03-4bbd-a13d-4cd01ca6e5bc",
   "metadata": {},
   "source": [
    "## Let's get a pointer to our Dataset\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "0525641c-721c-4c07-8fd8-343de0063768",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset: COVID19 Cases in 175 countries\n",
      "Description: Weekly data for an entire year\n",
      "\n",
      "WARNING: Too many assets to print... truncating... You may run \n",
      "\n",
      " assets = my_dataset.assets \n",
      "\n",
      "to view receive a dictionary you can parse through using Python\n",
      "(as opposed to blowing up your notebook with a massive printed table).\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<style>\n",
       "        #myInput {\n",
       "          background-position: 10px 12px; /* Position the search icon */\n",
       "          background-repeat: no-repeat; /* Do not repeat the icon image */\n",
       "          background-color: #bbb;\n",
       "          width: 98%; /* Full-width */\n",
       "          font-size: 14px; /* Increase font-size */\n",
       "          padding: 12px 20px 12px 40px; /* Add some padding */\n",
       "          border: 1px solid #ddd; /* Add a grey border */\n",
       "          margin-bottom: 12px; /* Add some space below the input */\n",
       "        }\n",
       "\n",
       "        #myTable {\n",
       "          border-collapse: collapse; /* Collapse borders */\n",
       "          width: 50%; /* Full-width */\n",
       "          border: 1px solid #ddd; /* Add a grey border */\n",
       "          font-size: 14px; /* Increase font-size */\n",
       "        }\n",
       "\n",
       "        #myTable th, #myTable td {\n",
       "          text-align: left; /* Left-align text */\n",
       "          padding: 10px; /* Add padding */\n",
       "        }\n",
       "\n",
       "        #myTable tr {\n",
       "          /* Add a bottom border to all table rows */\n",
       "          border-bottom: 1px solid #ddd;\n",
       "        }\n",
       "\n",
       "        #myTable tr.header, #myTable tr:hover {\n",
       "          /* Add a grey background color to the table header and on hover */\n",
       "          background-color: #777;\n",
       "        }\n",
       "        </style>\n",
       "\n",
       "        <table id=\"myTable\">\n",
       "          <tr class=\"header\">\n",
       "            <th style=\"width:15%;\">Asset Key</th>\n",
       "            <th style=\"width:20%;\">Type</th>\n",
       "            <th style=\"width:10%;\">Shape</th>\n",
       "          </tr>\n",
       "        \n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 0\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 1\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 2\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 3\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 4\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 5\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 6\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 7\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 8\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 9\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 10\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 11\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 12\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 13\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "\n",
       "              <tr>\n",
       "            <td>[\"Country 14\"]</td>\n",
       "            <td>Tensor</td>\n",
       "            <td>(53,)</td>\n",
       "          </tr>\n",
       "        </table>\n",
       "\n",
       "        "
      ],
      "text/plain": [
       "<syft.core.node.common.client_manager.dataset_api.Dataset at 0x10f9856a0>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "covid_ds = ds_node.datasets[0]\n",
    "covid_ds"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee056726-22a2-4bc6-b713-f57f1c06b0eb",
   "metadata": {},
   "source": [
    "**We can't see the dataset's values by printing it, hence we can't steal.**\n",
    "**Here is the tensor pointer to the dataset**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "bee3451d-29c6-498f-b35c-89b17086773b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<syft.core.node.common.client_manager.dataset_api.Dataset object at 0x10f9856a0>\n"
     ]
    }
   ],
   "source": [
    "print(covid_ds)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42f7a625-901d-4cdc-8e0b-8779dbcc9a00",
   "metadata": {},
   "source": [
    "## Let's do an extrapolation for next 3 months on a country's dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef513d73-c230-4c23-8c4c-cd834196f1a7",
   "metadata": {},
   "source": [
    "### Extract the data for a country"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e89a68d8-c316-4553-8dae-a0a4e1f98775",
   "metadata": {},
   "source": [
    "Create `result` - a pointer to one of the selected dataset's tensors.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "277b8461-8ca8-4410-9907-2a1a4bf10693",
   "metadata": {},
   "outputs": [],
   "source": [
    "result = covid_ds[\"Country 0\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6847c0aa-73ba-4f17-ab48-0fbc46ec650c",
   "metadata": {},
   "source": [
    "`publish` uses the privacy budget approved by the data owner to access the data in a noised format that does not compromise the original dataset. `sigma` is the amount of privacy budget the data scientist plans to use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "af930473-bb78-4532-b627-026cfeceb2e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "published_result = result.publish(sigma=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4bdf2fd-5949-43df-a26d-f9af6eebfa7a",
   "metadata": {},
   "source": [
    "We call `get()` to access the contents of the published_result pointer created above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "12e1278e-a39f-49fb-84fc-fcb1bc220a69",
   "metadata": {},
   "outputs": [],
   "source": [
    "published_data = published_result.block_with_timeout(60).get()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "d9f83b15-cfd9-4df9-8548-fca201814f0d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([  100.73878185,   541.4500865 ,  -375.10293856,  2858.73702973,\n",
       "         686.31945154,   -20.76044026,  1197.38230958,   640.06508438,\n",
       "         347.7196077 ,   990.81971463,   588.6657162 ,  1142.07340362,\n",
       "          64.05201107,  1212.90968109,  1354.55840716,   469.85859676,\n",
       "         800.12409571,   406.92008934,  -581.09715378,  -182.33866302,\n",
       "       -1601.69871867,   344.73025418, -1440.2914348 , -1037.69893063,\n",
       "       -1455.43654042,    15.35680767,  -562.58933802,  1449.02276369,\n",
       "        -321.00256185,   455.77455451,  -367.60258788,  1993.28491317,\n",
       "       -1531.85406781,   489.68772356,   354.53473314,    91.88429386,\n",
       "         729.65001485,  1101.29951442,  -257.16234613,    88.52534715,\n",
       "         204.61057498,   321.02971848,  1061.47491978, -1127.56615556,\n",
       "         263.99707188, -1471.40921471,  -207.98838313,   729.49451665,\n",
       "         125.73934123,  1501.26873026,  1553.67660508,   681.24566677,\n",
       "        -973.26207448])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "published_data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3539fc28-2c92-456a-80d1-de34c6563c29",
   "metadata": {},
   "source": [
    "**Check the privacy budget spent -- its decreased**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "032f84a2-a698-4baf-ac13-554b3587c7b2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "700.0\n"
     ]
    }
   ],
   "source": [
    "print(ds_node.privacy_budget)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9519085",
   "metadata": {},
   "source": [
    "You can request for budget from Data Owner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0e8468b-3756-419c-a8ef-2e2d0b9e225a",
   "metadata": {},
   "outputs": [],
   "source": [
    "#ds_node.request_budget(eps=100, reason=\"I want to do more data exploration\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcfb0222-6e64-4684-ab92-1982115097ba",
   "metadata": {},
   "source": [
    "### Load the dataset portion into a Pandas dataframe.\n",
    "\n",
    "Let's plot the noisy data. In comparison to the data visualized by data owner, it is impossible (thanks to differential privacy) to get exact same visualization, but the machine learning properties of the data remain the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "fb6bcf5b-5c71-4c31-9c22-4b2f8b15d117",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "data_df = pd.DataFrame(published_data)\n",
    "data_df.plot(legend=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "2404cdd0-a2d1-493a-895f-eb2bb54d2847",
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_extrapolated_country(idx):\n",
    "    x = list(range(53))\n",
    "    y = data_df.loc[:,idx].values\n",
    "\n",
    "    plt.plot(y)\n",
    "    \n",
    "    z = np.polyfit(x, y, 2)\n",
    "    f = np.poly1d(z)\n",
    "\n",
    "    new_points = range(12)\n",
    "    new_y = []\n",
    "    for x2 in new_points:\n",
    "        new_y.append(f(53+x2))\n",
    "        \n",
    "    plt.plot(range(53, 65), new_y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "6523209f-6ebc-490d-9950-730c63499d80",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plot_extrapolated_country(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d6a9350-0b49-4a07-9f21-df312752eba0",
   "metadata": {},
   "source": [
    "As you can see above, the data is obscured by noise, but the trends / modeling move in the expected direction.\n",
    "\n",
    "**This is the power of Remote Data Science. We're able to work with and get the benefits of data, without directly owning it, or exposing the privacy of the subjects whose data was collected.**"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}