{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Structured data prediction using Cloud ML Engine" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook illustrates:\n", "\n", "1. Exploring a BigQuery dataset using JupyterLab\n", "2. Creating datasets for Machine Learning using Dataflow\n", "3. Creating a model using the feature columns and Keras API\n", "4. Training on Cloud AI Platform\n", "5. Deploying model\n", "6. Predicting with model\n", "\n", "Before starting the lab, upgrade packages that are required for this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install tensorflow==2.2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now you have to restart the kernel by selecting the \"Kernel\" -> \"Restart Kernel\" from the menu bar** to reflect the newly installed modules.\n", "\n", "After restarting the kernel, you can resume the code execution from the next cell." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# change these to try this notebook out\n", "BUCKET = 'babyweight-keras-ml'\n", "PROJECT = 'babyweight-keras'\n", "REGION = 'us-central1'\n", "NOTEBOOK_DIR = '/home/jupyter/training-data-analyst/blogs/babyweight_keras'" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ['BUCKET'] = BUCKET\n", "os.environ['PROJECT'] = PROJECT\n", "os.environ['REGION'] = REGION\n", "os.environ['NOTEBOOK_DIR'] = NOTEBOOK_DIR" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Updated property [core/project].\n", "Updated property [compute/region].\n" ] } ], "source": [ "%%bash\n", "gcloud config set project $PROJECT\n", "gcloud config set compute/region $REGION" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Creating gs://babyweight-keras-ml/...\n" ] } ], "source": [ "%%bash\n", "if ! gsutil ls | grep -q gs://${BUCKET}/; then\n", " gsutil mb -l ${REGION} gs://${BUCKET}\n", "fi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 1: Data Analysis and Preparation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring data\n", "\n", "The data is natality data (record of births in the US). My goal is to predict the baby's weight given a number of factors about the pregnancy and the baby's mother. Later, we will want to split the data into training and eval datasets. The hash of the year-month will be used for that." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "query=\"\"\"\n", "SELECT\n", " weight_pounds,\n", " is_male,\n", " mother_age,\n", " plurality,\n", " gestation_weeks,\n", " FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING))) AS hashmonth\n", "FROM\n", " publicdata.samples.natality\n", "WHERE year > 2000\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weight_poundsis_malemother_agepluralitygestation_weekshashmonth
07.063611True32137.07108882242435606404
14.687028True30333.0-7170969733900686954
27.561856True20139.06392072535155213407
37.561856True31137.0-2126480030009879160
47.312733True32140.03408502330831153141
\n", "
" ], "text/plain": [ " weight_pounds is_male mother_age plurality gestation_weeks \\\n", "0 7.063611 True 32 1 37.0 \n", "1 4.687028 True 30 3 33.0 \n", "2 7.561856 True 20 1 39.0 \n", "3 7.561856 True 31 1 37.0 \n", "4 7.312733 True 32 1 40.0 \n", "\n", " hashmonth \n", "0 7108882242435606404 \n", "1 -7170969733900686954 \n", "2 6392072535155213407 \n", "3 -2126480030009879160 \n", "4 3408502330831153141 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from google.cloud import bigquery\n", "bq = bigquery.Client()\n", "\n", "query_job = bq.query(query + \" LIMIT 100\")\n", "df = query_job.to_dataframe()\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's write a query to find the unique values for each of the columns and the count of those values." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def get_distinct_values(column_name):\n", " sql = \"\"\"\n", "SELECT\n", " {0},\n", " COUNT(1) AS num_babies,\n", " AVG(weight_pounds) AS avg_wt\n", "FROM\n", " publicdata.samples.natality\n", "WHERE\n", " year > 2000\n", "GROUP BY\n", " {0}\n", " \"\"\".format(column_name)\n", " return bq.query(sql).to_dataframe()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAEWCAYAAABPON1ZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAARdElEQVR4nO3df5BV5X3H8c8HxCBCDODSSUN0gUERW0W4EMMKY7VktCKotYO0WuM4bqdjHftjcIy2mWlNO/5qJjA1OhtDbOqPaFQYjVYTrQyIYlwQo4J04g90mx9cSaMgroJ++8e9K8tylz1b7tnz7O77NbPDvXuPZz93Z+fjs88+5zyOCAEA0jWk6AAAgAOjqAEgcRQ1ACSOogaAxFHUAJC4Q/I46ZFHHhmNjY15nBoABqT169e/ExENtV7LpagbGxvV2tqax6kBYECyvbW715j6AIDEUdQAkDiKGgASR1EDQOIoagBIHEUNAImjqAEgcRQ1ACSOogaAxOVyZWJ/0Hj1I0VHGFDevP6soiMAAxYjagBIHEUNAImjqAEgcRQ1ACSOogaAxPW46sP2sZLu7fSpiZK+HhHfyisUMNixKqm++vuqpB6LOiK2SJomSbaHSvofSSvyjQUA6NDbqY/TJb0WEd3uRAAAqK/eFvUFku6p9YLtZtuttlvL5fLBJwMASOpFUds+VNICST+s9XpEtEREKSJKDQ0192cEAPw/9GZEfaakDRHx67zCAAD215uiXqxupj0AAPnJVNS2R0iaJ+nBfOMAALrKdPe8iNglaWzOWQAANXBlIgAkjqIGgMRR1ACQOIoaABJHUQNA4ihqAEgcRQ0AiaOoASBxFDUAJI6iBoDEUdQAkDiKGgASR1EDQOIoagBIHEUNAImjqAEgcRQ1ACSOogaAxFHUAJA4ihoAEpd1F/LP2b7f9qu2N9v+ct7BAAAVmXYhl7RU0mMRcb7tQyWNyDETAKCTHova9mclzZX0VUmKiI8kfZRvLABAhyxTHxMllSV9z/YLtm+3fXjXg2w322613Voul+seFAAGqyxFfYik6ZJujYiTJL0v6equB0VES0SUIqLU0NBQ55gAMHhlKeo2SW0R8Vz1+f2qFDcAoA/0WNQR8StJb9s+tvqp0yVtyjUVAOBTWVd9XCHpruqKj9clXZJfJABAZ5mKOiI2SirlGwUAUAtXJgJA4ihqAEgcRQ0AiaOoASBxFDUAJI6iBoDEUdQAkDiKGgASR1EDQOIoagBIHEUNAImjqAEgcRQ1ACSOogaAxFHUAJA4ihoAEkdRA0DiKGoASBxFDQCJy7Rnou03Je2Q9LGkPRHB/okA0Eey7kIuSX8QEe/klgQAUBNTHwCQuKxFHZJ+bHu97eZaB9hutt1qu7VcLtcvIQAMclmLuikipks6U9Lltud2PSAiWiKiFBGlhoaGuoYEgMEsU1FHxC+q/26TtELSrDxDAQD26rGobR9ue1THY0lfkfRy3sEAABVZVn38jqQVtjuOvzsiHss1FQDgUz0WdUS8LunEPsgCAKiB5XkAkDiKGgASR1EDQOIoagBIHEUNAImjqAEgcRQ1ACSOogaAxFHUAJA4ihoAEkdRA0DiKGoASBxFDQCJo6gBIHEUNQAkjqIGgMRR1ACQOIoaABJHUQNA4jIXte2htl+w/aM8AwEA9tWbEfWVkjbnFQQAUFumorY9XtJZkm7PNw4AoKusI+pvSbpK0ifdHWC72Xar7dZyuVyPbAAAZShq2/MlbYuI9Qc6LiJaIqIUEaWGhoa6BQSAwS7LiLpJ0gLbb0r6gaTTbN+ZayoAwKd6LOqI+FpEjI+IRkkXSPqviLgw92QAAEmsowaA5B3Sm4MjYpWkVbkkAQDUxIgaABJHUQNA4ihqAEgcRQ0AiaOoASBxFDUAJI6iBoDEUdQAkDiKGgASR1EDQOIoagBIHEUNAImjqAEgcRQ1ACSOogaAxFHUAJA4ihoAEkdRA0DiKGoASFyPRW17uO2f2n7R9iu2/7EvggEAKrJsbvuhpNMiYqftYZKetv2fEbEu52wAAGUo6ogISTurT4dVPyLPUACAvTLNUdseanujpG2SfhIRz9U4ptl2q+3Wcrlc55gAMHhlKuqI+DgipkkaL2mW7d+rcUxLRJQiotTQ0FDnmAAwePVq1UdE/FbSKkln5BEGALC/LKs+Gmx/rvr4MEl/KOnVnHMBAKqyrPr4vKR/tz1UlWK/LyJ+lG8sAECHLKs+fibppD7IAgCogSsTASBxFDUAJI6iBoDEUdQAkDiKGgASR1EDQOIoagBIHEUNAImjqAEgcRQ1ACSOogaAxFHUAJA4ihoAEkdRA0DiKGoASBxFDQCJo6gBIHEUNQAkjqIGgMRl2YX8i7afsr3Z9iu2r+yLYACAiiy7kO+R9HcRscH2KEnrbf8kIjblnA0AoAwj6oj4ZURsqD7eIWmzpC/kHQwAUNGrOWrbjZJOkvRcjdeabbfabi2Xy3WKBwDIXNS2R0p6QNJfR8R7XV+PiJaIKEVEqaGhoZ4ZAWBQy1TUtoepUtJ3RcSD+UYCAHSWZdWHJX1X0uaI+Gb+kQAAnWUZUTdJukjSabY3Vj/+KOdcAICqHpfnRcTTktwHWQAANXBlIgAkjqIGgMRR1ACQOIoaABJHUQNA4ihqAEgcRQ0AiaOoASBxFDUAJI6iBoDEUdQAkDiKGgASR1EDQOIoagBIHEUNAImjqAEgcRQ1ACSOogaAxFHUAJC4LLuQL7e9zfbLfREIALCvLCPqOySdkXMOAEA3eizqiFgt6Td9kAUAUEPd5qhtN9tutd1aLpfrdVoAGPTqVtQR0RIRpYgoNTQ01Ou0ADDoseoDABJHUQNA4rIsz7tH0rOSjrXdZvvS/GMBADoc0tMBEbG4L4JgYNq9e7fa2trU3t5edJTkDB8+XOPHj9ewYcOKjoLE9VjUwMFoa2vTqFGj1NjYKNtFx0lGRGj79u1qa2vThAkTio6DxDFHjVy1t7dr7NixlHQXtjV27Fh+00AmFDVyR0nXxvcFWVHUAJA45qjRpxqvfqSu53vz+rPqer56W7lypY455hhNnTq16CjoxxhRAzlauXKlNm3aVHQM9HMUNQaFc845RzNmzNDxxx+vlpYW3Xrrrbrqqqs+ff2OO+7QFVdcIUm67rrrNGXKFM2bN0+LFy/WzTffXPOc27Zt04wZMyRJL774omzrrbfekiRNmjRJzzzzjB566CEtWbJE06ZN02uvvZbzu8RAxdQHBoXly5drzJgx+uCDDzRz5kw9+eSTampq0o033ihJuvfee3XttdeqtbVVDzzwgF544QXt2bNH06dP/7SMuxo3bpza29v13nvvac2aNSqVSlqzZo1OOeUUjRs3TrNnz9aCBQs0f/58nX/++X35djHAUNQYFJYtW6YVK1ZIkt5++2298cYbmjhxotatW6fJkydry5Ytampq0tKlS7Vw4UIddthhkqSzzz77gOedPXu21q5dq9WrV+uaa67RY489pojQnDlzcn9PGDwoagx4q1at0hNPPKFnn31WI0aM0Kmnnqr29nYtWrRI9913n6ZMmaJzzz1XthURvTr3nDlztGbNGm3dulULFy7UDTfcINuaP39+Tu8GgxFz1Bjw3n33XY0ePVojRozQq6++qnXr1kmSzjvvPK1cuVL33HOPFi1aJEk65ZRT9PDDD6u9vV07d+7UI48ceJXK3Llzdeedd2ry5MkaMmSIxowZo0cffVRNTU2SpFGjRmnHjh35vkEMeIyo0aeKWE53xhln6LbbbtMJJ5ygY489VieffLIkafTo0Zo6dao2bdqkWbNmSZJmzpypBQsW6MQTT9TRRx+tUqmkI444ottzNzY2SqoUtlQp+ra2No0ePVqSdMEFF+iyyy7TsmXLdP/992vSpEk5vlMMVO7tr3pZlEqlaG1trft566ne63kHu+4KePPmzTruuOP6OM3B2blzp0aOHKldu3Zp7ty5amlp0fTp03P5Wt19f/j5rK/U19tLku31EVGq9RojaqCL5uZmbdq0Se3t7br44otzK2kgK4oa6OLuu+/e73OXX3651q5du8/nrrzySl1yySV9FQuDGEUNZHDLLbcUHQGDGKs+kLs8/g4yEPB9QVYUNXI1fPhwbd++nVLqomPjgOHDhxcdBf0AUx/I1fjx49XW1qZyuVx0lOR0bMUF9ISiRq6GDRvGVlPAQco09WH7DNtbbP/c9tV5hwIA7NVjUdseKukWSWdKmippsW3ugg4AfSTLiHqWpJ9HxOsR8ZGkH0hamG8sAECHLHPUX5D0dqfnbZK+1PUg282SmqtPd9recvDxIOlISe8UHaInvqHoBCgIP5/1c3R3L2Qp6lpbJe+31ioiWiS19CIUMrDd2t31/0DR+PnsG1mmPtokfbHT8/GSfpFPHABAV1mK+nlJk21PsH2opAskPZRvLABAhx6nPiJij+2/kvS4pKGSlkfEK7knQwemk5Ayfj77QC73owYA1A/3+gCAxFHUAJA4ihoAEkdRJ8YVF9r+evX5UbZnFZ0LQHEo6vR8W9KXJS2uPt+hyr1WgCTYHmH7H2x/p/p8su35RecayCjq9HwpIi6X1C5JEfG/kg4tNhKwj+9J+lCVAYVUuSjuG8XFGfgo6vTsrt6xMCTJdoOkT4qNBOxjUkTcKGm3JEXEB6p9qwnUCUWdnmWSVkgaZ/ufJT0t6V+KjQTs4yPbh2nvYGKSKiNs5IQLXhJke4qk01UZpTwZEZsLjgR8yvY8SX+vyv3pfyypSdJXI2JVkbkGMoo6MbaPqvX5iHirr7MA3bE9VtLJqgwm1kVE8rc67c8o6sTYfkmVXyktabikCZK2RMTxhQYDqmw3SdoYEe/bvlDSdElLI2JrwdEGLOaoExMRvx8RJ1T/nazKDjtPF50L6ORWSbtsnyhpiaStkr5fbKSBjaJOXERskDSz6BxAJ3ui8qv4QknLImKppFEFZxrQsuzwgj5k+287PR2iyq+V5YLiALXssP01SRdKmltdTjqs4EwDGiPq9Izq9PEZSY+IzYSRlkWqLMe7NCJ+pcq+qjcVG2lg44+JCamOTK6PiCVFZwGQDqY+EmH7kOpuOtOLzgLUYnuHamxsrcoKpYiIz/ZxpEGDok7HT1WZj95o+yFJP5T0fseLEfFgUcEASYoI/mBYEIo6PWMkbZd0mvaupw5JFDWSYnucKmv9JXFRVp4o6nSMq674eFl7C7oDf0hAMmwvkPSvkn5X0jZJR0vaLImLsnLCqo90DJU0svoxqtPjjg8gFdepcvn4f0fEBFXuS7O22EgDGyPqdPwyIv6p6BBABrsjYrvtIbaHRMRTtm8oOtRARlGng/v5or/4re2RklZLusv2Nkl7Cs40oLGOOhG2x0TEb4rOAXTH9lER8ZbtwyV9oMrU6Z9JOkLSXRGxvdCAAxhFDSAT2xsiYnr18QMR8cdFZxos+GMigKw6T89NLCzFIERRA8gqunmMnDH1ASAT2x+rcrWsJR0maVfHS+IS8lxR1ACQOKY+ACBxFDUAJI6iBoDEUdToF2w/08dfr9H2y335NYHuUNToFyJidtEZgKJQ1OgXbO+s/vt526ttb7T9su05B/pvbN9ge73tJ2zPsr3K9uvVW3V2jJzX2N5Q/djvfwi2h9q+yfbztn9m+y/ye6fA/ihq9Dd/KunxiJgm6URJGw9w7OGSVkXEDEk7JH1D0jxJ50rquFPhNknzqpdGL5K0rMZ5LpX0bkTMlDRT0mW2Jxz8WwGy4e556G+el7Tc9jBJKyNi4wGO/UjSY9XHL0n6MCJ2235JUmP188Mk/ZvtaZI+lnRMjfN8RdIJts+vPj9C0mRJbxzE+wAyo6jRr0TEattzJZ0l6T9s3xQR3+/m8N2x94quTyR9WD3HJ7Y7fvb/RtKvVRmdD5HUXuM8lnRFRDxer/cB9AZTH+hXbB8taVtEfEfSd1XZEPhgHKHKpg2fSLpIlZ12unpc0l9WR/GyfUz1Vp9An2BEjf7mVElLbO+WtFPSnx/k+b4t6QHbfyLpKXXa+b2T21WZKtlg25LKks45yK8LZMa9PgAgcUx9AEDimPpAv2f7OUmf6fLpiyLipSLyAPXG1AcAJI6pDwBIHEUNAImjqAEgcRQ1ACTu/wBuEj1Bxj7grQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df = get_distinct_values('is_male')\n", "df.plot(x='is_male', y='num_babies', kind='bar')\n", "df.plot(x='is_male', y='avg_wt', kind='bar')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df = get_distinct_values('mother_age')\n", "df = df.sort_values('mother_age')\n", "df.plot(x='mother_age', y='num_babies')\n", "df.plot(x='mother_age', y='avg_wt')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEDCAYAAADOc0QpAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAATF0lEQVR4nO3df7DddZ3f8ecrIZAi2WiRaUdC9maERiIqqVeUKE6wGsJChNLWElcXkQF3C7PMOEGCjjK605ZWh06noNvMTjYqCgKKJZCRON2lkR9tSABp2JRJymTLlSoB1sRgKETe/SM3yeXm17k59+aEz30+/sk5n/P98b7f5L7mm/f3c77fVBWSpLZM6HUBkqTRZ7hLUoMMd0lqkOEuSQ0y3CWpQYa7JDXoqF4XAPDWt761+vr6el2GJL2hrF279vmqOmFfnx0R4d7X18eaNWt6XYYkvaEk+dv9fdbTtkySBUmWbNmypZdlSFJzehruVbW8qq6YOnVqL8uQpOZ4QVWSGnRE9NwlHfleffVVBgYGePnll3tdyrgzefJkpk2bxqRJkzpep6fhnmQBsODkk0/uZRmSOjAwMMCUKVPo6+sjSa/LGTeqihdeeIGBgQFmzJjR8Xr23CV15OWXX+b444832A+zJBx//PEj/h+TPXdJHTPYe+NQjrvhLkkNauaCat/ie3tdAptuOK/XJUiHzWj/zh1Jvz9z587lG9/4Bv39/R0tv2zZMtasWcNNN92012dz5szhoYceGu0SD8ovMUnSGOpFsIMXVCW9QWzatIlTTz2Vyy+/nHe+853MmzeP7du3M3fu3N23L3n++efZdZ+qZcuWceGFF7JgwQJmzJjBTTfdxI033sjs2bP5wAc+wIsvvnjA/d1yyy3MmTOH0047jdWrVwOwevVq5syZw+zZs5kzZw5PPfXU7uWfeeYZ5s+fz8yZM/nqV7+6e/y4447b/frrX/8673vf+3j3u9/N9ddfD8BLL73Eeeedx3ve8x5OO+00fvCDH4zK8bLnLukNY8OGDVx55ZU8+eSTvPnNb+aHP/zhAZdft24d3//+91m9ejVf+tKXOPbYY3nsscc488wz+c53vnPAdV966SUeeughvvnNb/LZz34WgHe84x2sWrWKxx57jK997Wt88Ytf3L386tWr+d73vsfjjz/OHXfcsdf9slauXMmGDRtYvXo1jz/+OGvXrmXVqlX85Cc/4W1vexs///nPWbduHfPnzz/Eo/N6zfTcJbVvxowZnH766QC8973vZdOmTQdc/uyzz2bKlClMmTKFqVOnsmDBAgDe9a538cQTTxxw3YULFwLw4Q9/mK1bt/LrX/+a3/zmN1xyySVs2LCBJLz66qu7l//Yxz7G8ccfD8BFF13EAw888Lqe/cqVK1m5ciWzZ88GYNu2bWzYsIGzzjqLRYsWce2113L++edz1llnjeiY7I/hLukN45hjjtn9euLEiWzfvp2jjjqK1157DWCvueBDl58wYcLu9xMmTGDHjh0H3Nfw6YdJ+PKXv8zZZ5/NXXfdxaZNm5g7d+4Blx+qqrjuuuv43Oc+t9e+1q5dy4oVK7juuuuYN28eX/nKVw5YWydsy0h6Q+vr62Pt2rUA3HnnnaO23V297wceeICpU6cydepUtmzZwoknngjs7OkP9dOf/pQXX3yR7du38+Mf/5gPfvCDr/v8nHPOYenSpWzbtg2AX/ziFzz33HM8++yzHHvssXzqU59i0aJFPProo6NSv2fukg7JkTJ1cdGiRXziE5/gu9/9Lh/5yEdGbbtvectbmDNnDlu3bmXp0qUAfOELX+CSSy7hxhtv3GtfH/rQh/j0pz/Nxo0b+eQnP7nXNMp58+axfv16zjzzTGDnhdZbbrmFjRs3cs011zBhwgQmTZrEt771rVGpP1U1Khs6pJ3vubfM5Rs2bOhqW85zl8bW+vXrOfXUU3tdxri1r+OfZG1V7XMyvlMhJalBtmUkjVtXXnklDz744OvGrr76ai699NIeVTR6DHdJ49bNN9/c6xLGjLNlJHWsl9foxrNDOe6Gu6SOTJ48mRdeeMGAP8x2Paxj8uTJI1rPtoykjkybNo2BgQE2b97c61LGnV2P2RsJw11SRyZNmjSix7ypt2zLSFKDRv3MPclZwB8ObntWVc0Z7X1Ikg6sozP3JEuTPJdk3bDx+UmeSrIxyWKAqvpZVf0xcA/w7dEvWZJ0MJ22ZZYBr7vJcJKJwM3AucAsYGGSWUMW+SRw6yjUKEkaoY7CvapWAcMfW3IGsLGqnq6qV4DbgAsAkkwHtlTV1tEsVpLUmW4uqJ4IPDPk/cDgGMBlwF8eaOUkVyRZk2SNU6skaXR1E+7Zx1gBVNX1VXXAp8JW1ZKq6q+q/hNOOKGLMiRJw3UT7gPASUPeTwOeHckGkixIsmTLli1dlCFJGq6bcH8EOCXJjCRHAxcDd49kA97yV5LGRqdTIW8FHgZmJhlIcllV7QCuAu4D1gO3V9WTI9m5Z+6SNDY6+hJTVS3cz/gKYMWh7ryqlgPL+/v7Lz/UbUiS9ubtBySpQT0Nd9sykjQ2fIaqJDXIM3dJapBn7pLUIC+oSlKDDHdJapA9d0lqkD13SWqQbRlJapDhLkkNsucuSQ2y5y5JDbItI0kNMtwlqUGGuyQ1yAuqktQgL6hKUoNsy0hSgwx3SWqQ4S5JDTLcJalBhrskNcipkJLUIKdCSlKDbMtIUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktSgo0Z7g0kmAH8G/B6wpqq+Pdr7kCQdWEdn7kmWJnkuybph4/OTPJVkY5LFg8MXACcCrwIDo1uuJKkTnbZllgHzhw4kmQjcDJwLzAIWJpkFzAQerqrPA38yeqVKkjrVUVumqlYl6Rs2fAawsaqeBkhyGzvP2p8BXhlc5nejVKdGoG/xvb0ugU03nNfrEqRxrZsLqieyM8h3GRgc+xFwTpL/BKza38pJrkiyJsmazZs3d1GGJGm4bi6oZh9jVVW/BS472MpVtQRYAtDf319d1CFJGqabM/cB4KQh76cBz45kA97yV5LGRjfh/ghwSpIZSY4GLgbuHskGvOWvJI2NTqdC3go8DMxMMpDksqraAVwF3AesB26vqidHsnPP3CVpbHQ6W2bhfsZXACsOdedVtRxY3t/ff/mhbkOStDcfsydJDfIxe5LUIG8cJkkNsi0jSQ2yLSNJDbItI0kNsi0jSQ2yLSNJDbItI0kNMtwlqUH23CWpQfbcJalBtmUkqUGGuyQ1yHCXpAYZ7pLUIGfLSFKDnC0jSQ2yLSNJDTLcJalBhrskNchwl6QGGe6S1CCnQkpSg5wKKUkNsi0jSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDRj3ck8xN8rMkf55k7mhvX5J0cB2Fe5KlSZ5Lsm7Y+PwkTyXZmGTx4HAB24DJwMDolitJ6kSnZ+7LgPlDB5JMBG4GzgVmAQuTzAJ+VlXnAtcCXx29UiVJneoo3KtqFfDisOEzgI1V9XRVvQLcBlxQVa8Nfv53wDGjVqkkqWNHdbHuicAzQ94PAO9PchFwDvBm4Kb9rZzkCuAKgOnTp3dRhiRpuG7CPfsYq6r6EfCjg61cVUuAJQD9/f3VRR2SpGG6mS0zAJw05P004NmRbMBb/krS2Ogm3B8BTkkyI8nRwMXA3SPZgLf8laSx0elUyFuBh4GZSQaSXFZVO4CrgPuA9cDtVfXkSHbumbskjY2Oeu5VtXA/4yuAFYe686paDizv7++//FC3IUnam4/Zk6QG+Zg9SWqQNw6TpAbZlpGkBtmWkaQG2ZaRpAZ1c/uBriVZACw4+eSTe1mGGta3+N5el8CmG87rdQkah2zLSFKDbMtIUoMMd0lqkFMhJalB9twlqUG2ZSSpQYa7JDXIcJekBnlBVZIa5AVVSWqQbRlJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIOe5S1KDnOcuSQ2yLSNJDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoPGJNyTvCnJ2iTnj8X2JUkH1lG4J1ma5Lkk64aNz0/yVJKNSRYP+eha4PbRLFSS1LlOz9yXAfOHDiSZCNwMnAvMAhYmmZXko8DfAL8axTolSSNwVCcLVdWqJH3Dhs8ANlbV0wBJbgMuAI4D3sTOwN+eZEVVvTZ8m0muAK4AmD59+iH/AJKkvXUU7vtxIvDMkPcDwPur6iqAJJ8Bnt9XsANU1RJgCUB/f391UYckaZhuwj37GNsd0lW17KAbSBYAC04++eQuypAkDdfNbJkB4KQh76cBz45kA94VUpLGRjfh/ghwSpIZSY4GLgbuHskGvJ+7JI2NTqdC3go8DMxMMpDksqraAVwF3AesB26vqidHsnPP3CVpbHQ6W2bhfsZXACtGtSJJUtd8zJ4kNcjH7ElSg7xxmCQ1yLaMJDXItowkNci2jCQ1yLaMJDXItowkNci2jCQ1yHCXpAbZc5ekBtlzl6QG2ZaRpAZ18yQmSW8gfYvv7XUJbLrhvF6XMG545i5JDfKCqiQ1yAuqktQg2zKS1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQc5zl6QGOc9dkhpkW0aSGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1aNTDPcmpSf48yZ1J/mS0ty9JOriOwj3J0iTPJVk3bHx+kqeSbEyyGKCq1lfVHwOfAPpHv2RJ0sF0eua+DJg/dCDJROBm4FxgFrAwyazBzz4OPAD811GrVJLUsY7CvapWAS8OGz4D2FhVT1fVK8BtwAWDy99dVXOAPxzNYiVJnTmqi3VPBJ4Z8n4AeH+SucBFwDHAiv2tnOQK4AqA6dOnd1GGJGm4bsI9+xirqrofuP9gK1fVEmAJQH9/f3VRhyRpmG5mywwAJw15Pw14diQb8K6QkjQ2ugn3R4BTksxIcjRwMXD3SDbgXSElaWx0OhXyVuBhYGaSgSSXVdUO4CrgPmA9cHtVPTmSnXvmLkljo6Oee1Ut3M/4Cg5w0bSD7S4Hlvf3919+qNuQJO3N2w9IUoN8zJ4kNcjH7ElSgzxzl6QGdfMlpq55QVVSL/QtvrfXJbDphvPGdPteUJWkBhnuktQge+6S1CBny0hSg2zLSFKDDHdJapA9d0lqkD13SWpQqnr/EKQkm4G/7XUdwFuB53tdxBHA47CHx2IPj8UeR8qx+P2qOmFfHxwR4X6kSLKmqvp7XUeveRz28Fjs4bHY441wLLygKkkNMtwlqUGG++st6XUBRwiPwx4eiz08Fnsc8cfCnrskNcgzd0lqkOEuSQ0y3CWpQYa7SPKOJP8kyXHDxuf3qqZeSXJGkvcNvp6V5PNJ/qDXdR0Jknyn1zUcCZJ8aPDfxbxe13IgXlDdhySXVtVf9rqOwyHJnwJXAuuB04Grq+q/DH72aFX94x6Wd1gluR44l52Pn/wp8H7gfuCjwH1V9a97V93hleTu4UPA2cBfAVTVxw97UT2SZHVVnTH4+nJ2/r7cBcwDllfVDb2sb38M931I8n+qanqv6zgckvxP4Myq2pakD7gT+G5V/cckj1XV7N5WePgMHovTgWOAXwLTqmprkr8H/I+qencv6zuckjwK/A3wF0CxM9xvBS4GqKr/1rvqDq+hvwdJHgH+oKo2J3kT8N+r6l29rXDfevqA7F5K8sT+PgL+weGspccmVtU2gKralGQucGeS32fnsRhPdlTV74DfJvnfVbUVoKq2J3mtx7Udbv3A1cCXgGuq6vEk28dTqA8xIclb2NnGTlVtBqiql5Ls6G1p+zduw52dAX4O8HfDxgM8dPjL6ZlfJjm9qh4HGDyDPx9YChyRZyRj6JUkx1bVb4H37hpMMhUYV+FeVa8B/yHJHYN//orxmxdTgbXszIZK8g+r6peD16iO2BOg8fqXBXAPcNyuUBsqyf2HvZre+SPgdWcfVbUD+KMk/7k3JfXMh6vq/8HucNtlEnBJb0rqraoaAP5FkvOArb2upxeqqm8/H70G/NPDWMqI2HOXpAY5FVKSGmS4S1KDDHeNG0nuTzIqD1hIMjfJPYOvP55k8eDrC5PMGo19SN0w3KX9SNLRhIOqunvIF1kuBAx39ZzhruYk6Uvyv5J8O8kTSe5McuywZbYNef3PkywbfL0syY1J/hr4d4O3I3goyWODf87cx/4+k+SmJHOAjwNfT/J4krcPfhlo13KnJFk7Vj+3NNR4ngqpts0ELquqB5MsBf7VCNb9R8BHq+p3SX6PnVMkdyT5KPBvgH+2r5Wq6qHBr+3fU1V3AiTZMuR7BJcCyw79R5I6Z7irVc9U1YODr28B/nQE694x+E1V2PkFlm8nOYWdX8OfNMI6/gK4NMnngX8JnDHC9aVDYltGrRr+BY4DvZ887LOXhrz+M+Cvq+o0YME+lj2YH7LzZmTnA2ur6oURri8dEsNdrZqe5MzB1wuBB4Z9/qskpyaZwIG/ZTgV+MXg6890sN/fAFN2vamql4H7gG8B4+JOozoyGO5q1XrgksEbxP19dobrUIvZeQuKvwL+7wG28++Bf5vkQWBiB/u9Dbhm8ALs2wfHvsfO/ymsHEH9Ule8/YCaM3jr4nsGWyk9l2QRMLWqvtzrWjR+eEFVGkNJ7gLeDnyk17VofPHMXZIaZM9dkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNej/A8Z+W95qmoVHAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAEDCAYAAAAcI05xAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQyUlEQVR4nO3dfZBV9X3H8c9HxSJKEHBJOyVk1aEgpsrDhSiLjDFqiSKotYPMNFEm7f5RY8ik1SE6TqbjtPVp0urEcbpjCbY+RwNVtCY+URHFZBG0EXAmKujWKCttRcSNrn77x70r63phz4V77vmx+37N7Oy595x79nOP3M8cf/c8OCIEAEjXQUUHAADsHUUNAImjqAEgcRQ1ACSOogaAxFHUAJC4Q/JY6VFHHRXNzc15rBoABqR169a9ExFN1eblUtTNzc1qb2/PY9UAMCDZ3rqneQx9AEDiKGoASBxFDQCJy2WMGgA++ugjdXR0qKurq+goSRk6dKjGjh2rIUOGZH4NRQ0gFx0dHRo+fLiam5tlu+g4SYgIbd++XR0dHTr66KMzv46hDwC56Orq0ujRoynpXmxr9OjRNf9fBkUNIDeU9OftyzahqAEgBytWrNDGjRvrsq4kx6iblzxUdARJ0pZrzi46AjBg1Ptznfrnc8WKFZo7d64mTZq03+tijxrAgHbuuedq2rRpOv7449XW1qZbbrlFl19++afzly1bpksvvVSSdPXVV2vixIk644wztHDhQt1www1V17lt2zZNmzZNkvTCCy/Itl5//XVJ0rHHHqtnnnlGDzzwgC677DJNnjxZr7zyyn69hyT3qAGgXpYuXapRo0bpgw8+0PTp0/X444+rpaVF1113nSTpnnvu0ZVXXqn29nbdf//9Wr9+vbq7uzV16tRPy7ivMWPGqKurSzt27NDq1atVKpW0evVqzZo1S2PGjNHMmTM1b948zZ07VxdccMF+vweKGsCAdtNNN2n58uWSpDfeeEOvvfaajjnmGK1du1bjx4/Xyy+/rJaWFt14442aP3++DjvsMEnSOeecs9f1zpw5U2vWrNFTTz2lK664Qo888ogiQqecckrd3wNFDWDAWrVqlR577DE9++yzGjZsmE499VR1dXVpwYIFuvfeezVx4kSdd955sq1ab/R9yimnaPXq1dq6davmz5+va6+9VrY1d+7cur+PfseobU+wvaHXzw7b36t7EgCos3fffVcjR47UsGHDtHnzZq1du1aSdP7552vFihW66667tGDBAknSrFmz9OCDD6qrq0s7d+7UQw/t/cvP2bNn6/bbb9f48eN10EEHadSoUXr44YfV0tIiSRo+fLjee++9uryPfos6Il6OiMkRMVnSNEm7JC2vy18HgBzNmTNH3d3dOuGEE3TVVVfppJNOkiSNHDlSkyZN0tatWzVjxgxJ0vTp0zVv3jydeOKJOv/881UqlTRixIg9rrvnmvuzZ8+WVC76I488UiNHjpQkXXjhhbr++us1ZcqU/f4y0bXs7ts+U9IPI6Jlb8uVSqXYn+tRc3gecODbtGmTjjvuuKJj1GTnzp064ogjtGvXLs2ePVttbW2aOnVq3f9OtW1je11ElKotX+sY9YWS7trHbACQtNbWVm3cuFFdXV266KKLcinpfZG5qG0fKmmepB/sYX6rpFZJGjduXF3CAUAj3XnnnZ977pJLLtGaNWs+89zixYu1aNGiRsWqaY/6G5Kej4i3q82MiDZJbVJ56KMO2QCgcDfffHPREWo6M3GhGPYAgIbLVNS2h0k6Q9LP8o0DYCCp9djkwWBftkmmoo6IXRExOiLerfkvABiUhg4dqu3bt1PWvfTcOGDo0KE1vY4zEwHkYuzYsero6FBnZ2fRUZLScyuuWlDUAHIxZMiQmm43hT3jMqcAkDiKGgASR1EDQOIoagBIHEUNAImjqAEgcRQ1ACSOogaAxFHUAJA4ihoAEkdRA0DiKGoASBxFDQCJo6gBIHEUNQAkjqIGgMRR1ACQOIoaABKX9S7kR9q+z/Zm25tsn5x3MABAWdZ7Jt4o6ZGIuMD2oZKG5ZgJANBLv0Vt+wuSZku6WJIi4kNJH+YbCwDQI8vQxzGSOiX9xPZ627faPrzvQrZbbbfbbuf28ABQP1mK+hBJUyXdEhFTJL0vaUnfhSKiLSJKEVFqamqqc0wAGLyyFHWHpI6IeK7y+D6VixsA0AD9FnVEvCXpDdsTKk99XdLGXFMBAD6V9aiPSyXdUTni41VJi/KLBADoLVNRR8QGSaV8owAAquHMRABIHEUNAImjqAEgcRQ1ACSOogaAxFHUAJA4ihoAEkdRA0DiKGoASBxFDQCJo6gBIHEUNQAkjqIGgMRR1ACQOIoaABJHUQNA4ihqAEgcRQ0AiaOoASBxme6ZaHuLpPckfSypOyK4f2KDNC95qOgIkqQt15xddARg0Mp6F3JJ+lpEvJNbEgBAVQx9AEDishZ1SPqF7XW2W/MMBAD4rKxDHy0R8abtMZIetb05Ip7qvUClwFslady4cXWOCQCDV6Y96oh4s/J7m6TlkmZUWaYtIkoRUWpqaqpvSgAYxPotatuH2x7eMy3pTEm/zjsYAKAsy9DHFyUtt92z/J0R8UiuqQAAn+q3qCPiVUknNiALAKAKDs8DgMRR1ACQOIoaABJHUQNA4ihqAEgcRQ0AiaOoASBxFDUAJI6iBoDEUdQAkDiKGgASR1EDQOIoagBIHEUNAImjqAEgcRQ1ACSOogaAxFHUAJA4ihoAEpe5qG0fbHu97ZV5BgIAfFYte9SLJW3KKwgAoLpMRW17rKSzJd2abxwAQF9Z96j/SdLlkj7JLwoAoJp+i9r2XEnbImJdP8u12m633d7Z2Vm3gAAw2GXZo26RNM/2Fkl3SzrN9u19F4qItogoRUSpqampzjEBYPDqt6gj4gcRMTYimiVdKOmJiPjz3JMBACRxHDUAJO+QWhaOiFWSVuWSBABQFXvUAJA4ihoAEkdRA0DiKGoASFxNXyYCRWpe8lDRESRJW645u+gIGGTYowaAxFHUAJA4ihoAEkdRA0DiKGoASBxFDQCJo6gBIHEUNQAkjqIGgMRR1ACQOIoaABJHUQNA4ihqAEgcRQ0AiaOoASBx/Ra17aG2f2n7Bdsv2f7bRgQDAJRluXHA7ySdFhE7bQ+R9LTt/4iItTlnAwAoQ1FHREjaWXk4pPITeYYCAOyWaYza9sG2N0jaJunRiHiuyjKtttttt3d2dtY5JgAMXpmKOiI+jojJksZKmmH7K1WWaYuIUkSUmpqa6hwTAAavmo76iIj/k7RK0pw8wgAAPi/LUR9Nto+sTB8m6XRJm3POBQCoyHLUxx9Ius32wSoX+70RsTLfWACAHlmO+nhR0pQGZAGQUfOSh4qOIEnacs3ZRUcYFDgzEQASR1EDQOIoagBIHEUNAImjqAEgcRQ1ACSOogaAxFHUAJC4LGcmAkCyBsPJP+xRA0DiKGoASBxFDQCJo6gBIHEUNQAkjqIGgMRR1ACQOIoaABJHUQNA4ihqAEhclruQf8n2k7Y32X7J9uJGBAMAlGW51ke3pL+OiOdtD5e0zvajEbEx52wAAGXYo46I30bE85Xp9yRtkvSHeQcDAJTVNEZtu1nSFEnP5ZIGAPA5mYva9hGS7pf0vYjYUWV+q+122+2dnZ31zAgAg1qmorY9ROWSviMiflZtmYhoi4hSRJSamprqmREABrUsR31Y0r9I2hQRP8o/EgCgtyx71C2SvinpNNsbKj9n5ZwLAFDR7+F5EfG0JDcgCwCgCs5MBIDEUdQAkDiKGgASR1EDQOIoagBIHEUNAImjqAEgcRQ1ACSOogaAxFHUAJA4ihoAEkdRA0DiKGoASBxFDQCJo6gBIHEUNQAkjqIGgMRR1ACQOIoaABKX5S7kS21vs/3rRgQCAHxWlj3qZZLm5JwDALAH/RZ1RDwl6X8akAUAUAVj1ACQuLoVte1W2+222zs7O+u1WgAY9OpW1BHRFhGliCg1NTXVa7UAMOgx9AEAictyeN5dkp6VNMF2h+1v5x8LANDjkP4WiIiFjQgCAKiOoQ8ASBxFDQCJo6gBIHEUNQAkjqIGgMRR1ACQOIoaABJHUQNA4ihqAEgcRQ0AiaOoASBxFDUAJI6iBoDEUdQAkDiKGgASR1EDQOIoagBIHEUNAImjqAEgcRQ1ACQuU1HbnmP7Zdu/sb0k71AAgN36LWrbB0u6WdI3JE2StND2pLyDAQDKsuxRz5D0m4h4NSI+lHS3pPn5xgIA9HBE7H0B+wJJcyLiLyqPvynpqxHxnT7LtUpqrTycIOnl+setyVGS3ik4QyrYFruxLXZjW+yWwrb4ckQ0VZtxSIYXu8pzn2v3iGiT1FZjsNzYbo+IUtE5UsC22I1tsRvbYrfUt0WWoY8OSV/q9XispDfziQMA6CtLUf9K0njbR9s+VNKFkh7INxYAoEe/Qx8R0W37O5J+LulgSUsj4qXck+2/ZIZhEsC22I1tsRvbYrekt0W/XyYCAIrFmYkAkDiKGgASR1EDQOIo6gHI9kTbX7d9RJ/n5xSVqSi2Z9ieXpmeZPv7ts8qOlfRbP9r0RlSYXtW5d/FmUVn2ZMB/2Wi7UUR8ZOiczSK7e9KukTSJkmTJS2OiH+vzHs+IqYWGK+hbP9Q5WvUHCLpUUlflbRK0umSfh4Rf1dcusax3fdwWkv6mqQnJCki5jU8VIFs/zIiZlSm/1Llz8tySWdKejAirikyXzWDoahfj4hxRedoFNv/JenkiNhpu1nSfZL+LSJutL0+IqYUm7BxKttisqTfk/SWpLERscP2YZKei4gTiszXKLafl7RR0q0qn1VsSXepfE6EIuI/i0vXeL0/B7Z/JemsiOi0fbiktRHxx8Um/Lwsp5Anz/aLe5ol6YuNzJKAgyNipyRFxBbbp0q6z/aXVf1yAANZd0R8LGmX7VciYockRcQHtj8pOFsjlSQtlnSlpMsiYoPtDwZbQfdykO2RKg/9OiI6JSki3rfdXWy06gZEUatcxn8i6X/7PG9JzzQ+TqHesj05IjZIUmXPeq6kpZKS21PI2Ye2h0XELknTep60PULSoCnqiPhE0j/a/mnl99saOJ/9fTFC0jqV+yFs/35EvFX5TifJnZmB8h9rpaQjesqpN9urGp6mWN+S9Jm9gojolvQt2/9cTKTCzI6I30mfllWPIZIuKiZScSKiQ9Kf2T5b0o6i8xQlIpr3MOsTSec1MEpmA36MGgAOdByeBwCJo6gBIHEUNQ5ItlfZrsuF3m2fantlZXpezw2cbZ/L/UGRAooag4LtTF+cR8QDvU54OFflGzoDhaKokTTbzbY3277N9ou277M9rM8yO3tNX2B7WWV6me0f2X5S0rWV08mfsb2+8ntClb93se0f254paZ6k621vsH1s5cSRnuXG216X1/sGehsoh+dhYJsg6dsRscb2Ukl/VcNr/0jS6RHxse0vqHzIXrft0yX9vaQ/rfaiiHimcur1yoi4T5Jsv9vrGPVFkpbt+1sCsqOocSB4IyLWVKZvl/TdGl7708rZiVL5RIfbbI9X+VTqITXmuFXSItvfl7RA0owaXw/sE4Y+cCDoe7D/3h4P7TPv/V7TV0t6MiK+IumcKsv2536VL/I0V9K6iNhe4+uBfUJR40AwzvbJlemFkp7uM/9t28fZPkh7P7NshKT/rkxfnOHvvidpeM+DiOhS+d6ht0gaNFdkRPEoahwINkm6qHLxrVEqF2VvS1S+jMATkn67l/VcJ+kfbK9R+UbN/blb0mWVLx+PrTx3h8p78L+oIT+wXziFHEmrXKp1ZWW4onC2/0bSiIi4qugsGDz4MhHIyPZyScdKOq3oLBhc2KMGgMQxRg0AiaOoASBxFDUAJI6iBoDEUdQAkDiKGgAS9/8oGEk2rAoOEwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df = get_distinct_values('plurality')\n", "df = df.sort_values('plurality')\n", "df.plot(x='plurality', y='num_babies', logy=True, kind='bar')\n", "df.plot(x='plurality', y='avg_wt', kind='bar')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df = get_distinct_values('gestation_weeks')\n", "df = df.sort_values('gestation_weeks')\n", "df.plot(x='gestation_weeks', y='num_babies', logy=True, kind='bar', color='royalblue')\n", "df.plot(x='gestation_weeks', y='avg_wt', kind='bar', color='royalblue')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All these factors seem to play a part in the baby's weight. Male babies are heavier on average than female babies. Teenaged and older moms tend to have lower-weight babies. Twins, triplets, etc. are lower weight than single births. Preemies weigh in lower as do babies born to single moms. In addition, it is important to check whether you have enough data (number of babies) for each input value. Otherwise, the model prediction against input values that doesn't have enough data may not be reliable.\n", "\n", "In the rest of this notebook, we will use machine learning to combine all of these factors to come up with a prediction of a baby's weight." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a ML dataset using Dataflow\n", "I'm going to use Cloud Dataflow to read in the BigQuery data, do some preprocessing, and write it out as CSV files.\n", "\n", "Instead of using Beam/Dataflow, I had three other options:\n", "\n", "1. Use Cloud Dataprep to visually author a Dataflow pipeline. Cloud Dataprep also allows me to explore the data, so we could have avoided much of the handcoding of Python/Seaborn calls above as well!\n", "2. Read from BigQuery directly using TensorFlow.\n", "3. Use the BigQuery console (http://bigquery.cloud.google.com) to run a Query and save the result as a CSV file. For larger datasets, you may have to select the option to \"allow large results\" and save the result into a CSV file on Google Cloud Storage.\n", "\n", "However, in this case, I want to do some preprocessing. I want to modify the data such that we can simulate what is known if no ultrasound has been performed. If I didn't need preprocessing, I could have used the web console. Also, I prefer to script it out rather than run queries on the user interface. Therefore, I am using Cloud Dataflow for the preprocessing." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:86: BeamDeprecationWarning: BigQuerySource is deprecated since 2.25.0. Use ReadFromBigQuery instead.\n", "WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.\n" ] }, { "data": { "application/javascript": [ "\n", " if (typeof window.interactive_beam_jquery == 'undefined') {\n", " var jqueryScript = document.createElement('script');\n", " jqueryScript.src = 'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n", " jqueryScript.type = 'text/javascript';\n", " jqueryScript.onload = function() {\n", " var datatableScript = document.createElement('script');\n", " datatableScript.src = 'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n", " datatableScript.type = 'text/javascript';\n", " datatableScript.onload = function() {\n", " window.interactive_beam_jquery = jQuery.noConflict(true);\n", " window.interactive_beam_jquery(document).ready(function($){\n", " \n", " });\n", " }\n", " document.head.appendChild(datatableScript);\n", " };\n", " document.head.appendChild(jqueryScript);\n", " } else {\n", " window.interactive_beam_jquery(document).ready(function($){\n", " \n", " });\n", " }" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", " var import_html = () => {\n", " ['https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html'].forEach(href => {\n", " var link = document.createElement('link');\n", " link.rel = 'import'\n", " link.href = href;\n", " document.head.appendChild(link);\n", " });\n", " }\n", " if ('import' in document.createElement('link')) {\n", " import_html();\n", " } else {\n", " var webcomponentScript = document.createElement('script');\n", " webcomponentScript.src = 'https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js';\n", " webcomponentScript.type = 'text/javascript';\n", " webcomponentScript.onload = function(){\n", " import_html();\n", " };\n", " document.head.appendChild(webcomponentScript);\n", " }" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py:1971: BeamDeprecationWarning: options is deprecated since First stable release. References to .options will not be supported\n", " temp_location = pcoll.pipeline.options.view_as(\n", "WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.\n", "WARNING:apache_beam.options.pipeline_options:Discarding invalid overrides: {'teardown_policy': 'TEARDOWN_ALWAYS', 'no_save_main_session': True}\n", "WARNING:apache_beam.options.pipeline_options:Discarding invalid overrides: {'teardown_policy': 'TEARDOWN_ALWAYS', 'no_save_main_session': True}\n" ] } ], "source": [ "import apache_beam as beam\n", "import datetime\n", "\n", "def to_csv(rowdict):\n", " # pull columns from BQ and create a line\n", " import hashlib\n", " import copy\n", " CSV_COLUMNS = 'weight_pounds,is_male,mother_age,plurality,gestation_weeks'.split(',')\n", " \n", " # create synthetic data where we assume that no ultrasound has been performed\n", " # and so we don't know sex of the baby. Let's assume that we can tell the difference\n", " # between single and multiple, but that the errors rates in determining exact number\n", " # is difficult in the absence of an ultrasound.\n", " no_ultrasound = copy.deepcopy(rowdict)\n", " w_ultrasound = copy.deepcopy(rowdict)\n", "\n", " no_ultrasound['is_male'] = 'Unknown'\n", " if rowdict['plurality'] > 1:\n", " no_ultrasound['plurality'] = 'Multiple(2+)'\n", " else:\n", " no_ultrasound['plurality'] = 'Single(1)'\n", " \n", " # Change the plurality column to strings\n", " w_ultrasound['plurality'] = \\\n", " ['Single(1)', 'Twins(2)', 'Triplets(3)', 'Quadruplets(4)', 'Quintuplets(5)'][rowdict['plurality']-1]\n", " \n", " # Write out two rows for each input row, one with ultrasound and one without\n", " for result in [no_ultrasound, w_ultrasound]:\n", " data = ','.join([str(result[k]) if k in result else 'None' for k in CSV_COLUMNS])\n", " key = hashlib.sha224(data.encode('utf-8')).hexdigest() # hash the columns to form a key\n", " yield str('{},{}'.format(data, key))\n", " \n", "def preprocess(in_test_mode):\n", " job_name = 'preprocess-babyweight-features' + '-' + datetime.datetime.now().strftime('%y%m%d-%H%M%S')\n", " \n", " if in_test_mode:\n", " OUTPUT_DIR = './preproc'\n", " else:\n", " OUTPUT_DIR = 'gs://{0}/babyweight/preproc/'.format(BUCKET)\n", " \n", " options = {\n", " 'staging_location': os.path.join(OUTPUT_DIR, 'tmp', 'staging'),\n", " 'temp_location': os.path.join(OUTPUT_DIR, 'tmp'),\n", " 'job_name': job_name,\n", " 'project': PROJECT,\n", " 'teardown_policy': 'TEARDOWN_ALWAYS',\n", " 'max_num_workers': 3, # CHANGE THIS IF YOU HAVE MORE QUOTA\n", " 'no_save_main_session': True\n", " }\n", " opts = beam.pipeline.PipelineOptions(flags=[], **options)\n", " if in_test_mode:\n", " RUNNER = 'DirectRunner'\n", " else:\n", " RUNNER = 'DataflowRunner'\n", " p = beam.Pipeline(RUNNER, options=opts)\n", " query = \"\"\"\n", "SELECT\n", " weight_pounds,\n", " is_male,\n", " mother_age,\n", " plurality,\n", " gestation_weeks,\n", " FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING))) AS hashmonth\n", "FROM\n", " publicdata.samples.natality\n", "WHERE year > 2000\n", "AND weight_pounds > 0\n", "AND mother_age > 0\n", "AND plurality > 0\n", "AND gestation_weeks > 0\n", "AND month > 0\n", " \"\"\"\n", " \n", " if in_test_mode:\n", " query = query + ' LIMIT 100' \n", " \n", " for step in ['train', 'eval']:\n", " if step == 'train':\n", " selquery = 'SELECT * FROM ({}) WHERE ABS(MOD(hashmonth, 4)) < 3'.format(query)\n", " else:\n", " selquery = 'SELECT * FROM ({}) WHERE ABS(MOD(hashmonth, 4)) = 3'.format(query)\n", "\n", " (p \n", " | '{}_read'.format(step) >> beam.io.Read(beam.io.BigQuerySource(query=selquery, use_standard_sql=True))\n", " | '{}_csv'.format(step) >> beam.FlatMap(to_csv)\n", " | '{}_out'.format(step) >> beam.io.Write(beam.io.WriteToText(os.path.join(OUTPUT_DIR, '{}.csv'.format(step))))\n", " )\n", " \n", " job = p.run()\n", " \n", "preprocess(in_test_mode=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that after you launch this, the actual processing is happening on the Cloud. Go to the GCP web console to the Dataflow section and monitor the running job. You'll see a job that's running. It took about **30 minutes** for me.\n", "\n", "Once the job has completed, run the cell below to check the location of the processed files." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gs://babyweight-keras-ml/babyweight/preproc/eval.csv-00000-of-00008\n", "gs://babyweight-keras-ml/babyweight/preproc/train.csv-00000-of-00009\n" ] } ], "source": [ "%%bash\n", "gsutil ls gs://${BUCKET}/babyweight/preproc/*-00000*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 2: Developing a Machine Learning Model using TensorFlow and Cloud ML Engine" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a TensorFlow model using the Keras API\n", "\n", "First, write an read_dataset to create a generator object that reads the data." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "import shutil\n", "import numpy as np\n", "import tensorflow as tf\n", "from tensorflow.keras import layers, models" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "CSV_COLUMNS = 'weight_pounds,is_male,mother_age,plurality,gestation_weeks,key'.split(',')\n", "SELECT_COLUMNS = 'weight_pounds,is_male,mother_age,plurality,gestation_weeks'.split(',')\n", "LABEL_COLUMN = 'weight_pounds'\n", "\n", "def read_dataset(prefix, pattern, batch_size=512, eval=False):\n", " # use prefix to create filename\n", " filename = 'gs://{}/babyweight/preproc/{}*{}*'.format(BUCKET, prefix, pattern)\n", " if eval:\n", " dataset = tf.data.experimental.make_csv_dataset(\n", " filename, header=False, batch_size=batch_size,\n", " shuffle=False, num_epochs=1,\n", " column_names=CSV_COLUMNS, label_name=LABEL_COLUMN,\n", " select_columns=SELECT_COLUMNS\n", " )\n", " else:\n", " dataset = tf.data.experimental.make_csv_dataset(\n", " filename, header=False, batch_size=batch_size,\n", " shuffle=True, num_epochs=None,\n", " column_names=CSV_COLUMNS, label_name=LABEL_COLUMN,\n", " select_columns=SELECT_COLUMNS\n", " )\n", " return dataset " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, define the feature columns." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "def get_wide_deep():\n", " # defin model inputs\n", " inputs = {}\n", " inputs['is_male'] = layers.Input(shape=(), name='is_male', dtype='string')\n", " inputs['plurality'] = layers.Input(shape=(), name='plurality', dtype='string')\n", " inputs['mother_age'] = layers.Input(shape=(), name='mother_age', dtype='float32')\n", " inputs['gestation_weeks'] = layers.Input(shape=(), name='gestation_weeks', dtype='float32')\n", " \n", " # define column types\n", " is_male = tf.feature_column.categorical_column_with_vocabulary_list(\n", " 'is_male', ['True', 'False', 'Unknown'])\n", " plurality = tf.feature_column.categorical_column_with_vocabulary_list(\n", " 'plurality', ['Single(1)', 'Twins(2)', 'Triplets(3)', 'Quadruplets(4)', 'Quintuplets(5)', 'Multiple(2+)']) \n", " mother_age = tf.feature_column.numeric_column('mother_age')\n", " gestation_weeks = tf.feature_column.numeric_column('gestation_weeks')\n", "\n", " # discretize\n", " age_buckets = tf.feature_column.bucketized_column(\n", " mother_age, boundaries=np.arange(15, 45, 1).tolist())\n", " gestation_buckets = tf.feature_column.bucketized_column(\n", " gestation_weeks, boundaries=np.arange(17, 47, 1).tolist())\n", "\n", " # sparse columns are wide \n", " wide = [tf.feature_column.indicator_column(is_male),\n", " tf.feature_column.indicator_column(plurality),\n", " age_buckets,\n", " gestation_buckets]\n", " \n", " # feature cross all the wide columns and embed into a lower dimension\n", " crossed = tf.feature_column.crossed_column(\n", " [is_male, plurality, age_buckets, gestation_buckets], hash_bucket_size=20000)\n", " embed = tf.feature_column.embedding_column(crossed, 3)\n", " \n", " # continuous columns are deep\n", " deep = [mother_age,\n", " gestation_weeks,\n", " embed]\n", "\n", " return wide, deep, inputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the wide and deep model using the Keras API." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def create_keras_model():\n", " wide, deep, inputs = get_wide_deep()\n", " feature_layer_wide = layers.DenseFeatures(wide, name='wide_features')\n", " feature_layer_deep = layers.DenseFeatures(deep, name='deep_features')\n", "\n", " wide_model = feature_layer_wide(inputs)\n", " \n", " deep_model = layers.Dense(64, activation='relu', name='DNN_layer1')(feature_layer_deep(inputs))\n", " deep_model = layers.Dense(32, activation='relu', name='DNN_layer2')(deep_model)\n", "\n", " wide_deep_model = layers.Dense(1, name='weight')(layers.concatenate([wide_model, deep_model]))\n", " model = models.Model(inputs=inputs, outputs=wide_deep_model)\n", "\n", " # Compile Keras model\n", " model.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(lr=0.0001))\n", " return model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, train the model locally on the notebook environment." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/10\n", "195/195 [==============================] - 3s 14ms/step - loss: 7.1371 - val_loss: 1.6387\n", "Epoch 2/10\n", "195/195 [==============================] - 3s 15ms/step - loss: 1.4740 - val_loss: 1.4089\n", "Epoch 3/10\n", "195/195 [==============================] - 2s 12ms/step - loss: 1.3739 - val_loss: 1.4015\n", "Epoch 4/10\n", "195/195 [==============================] - 3s 15ms/step - loss: 1.1895 - val_loss: 1.5342\n", "Epoch 5/10\n", "195/195 [==============================] - 3s 13ms/step - loss: 1.1764 - val_loss: 2.1909\n", "Epoch 6/10\n", "195/195 [==============================] - 3s 14ms/step - loss: 1.1800 - val_loss: 2.8775\n", "Epoch 7/10\n", "195/195 [==============================] - 2s 10ms/step - loss: 1.1675 - val_loss: 4.0380\n", "Epoch 8/10\n", "195/195 [==============================] - 3s 14ms/step - loss: 1.1416 - val_loss: 3.9998\n", "Epoch 9/10\n", "195/195 [==============================] - 2s 10ms/step - loss: 1.1240 - val_loss: 4.7713\n", "Epoch 10/10\n", "195/195 [==============================] - 2s 8ms/step - loss: 1.1160 - val_loss: 5.5051\n" ] } ], "source": [ "keras_model = create_keras_model()\n", "\n", "PATTERN = \"00000-of-\" # process only one of the shards, for testing purposes\n", "BATCH_SIZE = 512\n", "NUM_TRAIN_EXAMPLES = 1000000\n", "NUM_EVAL_EXAMPLES = 10000\n", "NUM_EVALS = 10\n", "\n", "training_dataset = read_dataset('train', PATTERN, BATCH_SIZE)\n", "validation_dataset = read_dataset('eval', PATTERN, BATCH_SIZE).take(NUM_EVAL_EXAMPLES//BATCH_SIZE)\n", "\n", "history = keras_model.fit(\n", " training_dataset,\n", " steps_per_epoch=NUM_TRAIN_EXAMPLES//(BATCH_SIZE * NUM_EVALS),\n", " epochs=NUM_EVALS,\n", " validation_data=validation_dataset\n", ")" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model: \"model\"\n", "__________________________________________________________________________________________________\n", "Layer (type) Output Shape Param # Connected to \n", "==================================================================================================\n", "gestation_weeks (InputLayer) [(None,)] 0 \n", "__________________________________________________________________________________________________\n", "is_male (InputLayer) [(None,)] 0 \n", "__________________________________________________________________________________________________\n", "mother_age (InputLayer) [(None,)] 0 \n", "__________________________________________________________________________________________________\n", "plurality (InputLayer) [(None,)] 0 \n", "__________________________________________________________________________________________________\n", "deep_features (DenseFeatures) (None, 5) 60000 gestation_weeks[0][0] \n", " is_male[0][0] \n", " mother_age[0][0] \n", " plurality[0][0] \n", "__________________________________________________________________________________________________\n", "DNN_layer1 (Dense) (None, 64) 384 deep_features[0][0] \n", "__________________________________________________________________________________________________\n", "wide_features (DenseFeatures) (None, 71) 0 gestation_weeks[0][0] \n", " is_male[0][0] \n", " mother_age[0][0] \n", " plurality[0][0] \n", "__________________________________________________________________________________________________\n", "DNN_layer2 (Dense) (None, 32) 2080 DNN_layer1[0][0] \n", "__________________________________________________________________________________________________\n", "concatenate (Concatenate) (None, 103) 0 wide_features[0][0] \n", " DNN_layer2[0][0] \n", "__________________________________________________________________________________________________\n", "weight (Dense) (None, 1) 104 concatenate[0][0] \n", "==================================================================================================\n", "Total params: 62,568\n", "Trainable params: 62,568\n", "Non-trainable params: 0\n", "__________________________________________________________________________________________________\n" ] } ], "source": [ "keras_model.summary()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tf.keras.utils.plot_model(\n", " model=keras_model, to_file=\"dnn_model.png\", show_shapes=False, rankdir=\"LR\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can export the trainined model in the saved_model format with the following command." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "If using Keras pass *_constraint arguments to layers.\n" ] } ], "source": [ "import datetime\n", "ts = datetime.datetime.now().strftime('%Y%m%d%H%M%S')\n", "export_path = \"gs://{}/babyweight/trained_model/export/{}\".format(BUCKET, ts)\n", "keras_model.save(export_path, save_format=\"tf\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make predictions locally with the trained model, you need a preprocess function. As shown in the later section, when you deploy it to the AI Platform using the saved_model, the preprocessing is automatically added." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "def preprocess(req):\n", " data = {}\n", " for instance in req['instances']:\n", " for key, value in instance.items():\n", " if not key in data.keys():\n", " data[key] = []\n", " data[key].append((value,))\n", " return tf.data.Dataset.from_tensor_slices(data)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[5.602193],\n", " [6.725526]], dtype=float32)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "request_data = {'instances':\n", " [\n", " {\n", " 'is_male': 'True',\n", " 'mother_age': 26.0,\n", " 'plurality': 'Single(1)',\n", " 'gestation_weeks': 39\n", " },\n", " {\n", " 'is_male': 'False',\n", " 'mother_age': 29.0,\n", " 'plurality': 'Single(1)',\n", " 'gestation_weeks': 38\n", " },\n", " ]\n", "}\n", "\n", "keras_model.predict(preprocess(request_data))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have the TensorFlow code working on a subset of the data (in the code above, I was reading only the 00000-of-x file), we can package the TensorFlow code up as a Python module and train it on Cloud AI Platform." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 12\n", "-rw-r--r-- 1 jupyter jupyter 0 Nov 23 06:08 __init__.py\n", "-rw-r--r-- 1 jupyter jupyter 4342 Nov 23 06:08 model.py\n", "-rw-r--r-- 1 jupyter jupyter 3860 Nov 23 06:08 task.py\n" ] } ], "source": [ "%%bash\n", "cd $NOTEBOOK_DIR\n", "ls -l trainer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After moving the code to a package, make sure it works standalone. (Note the --num-train-example and --num-eval-examples lines so that I am not trying to boil the ocean on my laptop). Even then, this takes about **a minute** in which you won't see any output..." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/4\n", "19/19 [==============================] - 2s 83ms/step - loss: 9.2353 - val_loss: 5.2838\n", "Epoch 2/4\n", "19/19 [==============================] - 0s 22ms/step - loss: 2.1148 - val_loss: 2.1172\n", "Epoch 3/4\n", "19/19 [==============================] - 0s 23ms/step - loss: 1.3234 - val_loss: 2.0525\n", "Epoch 4/4\n", "19/19 [==============================] - 0s 21ms/step - loss: 1.1866 - val_loss: 2.3361\n", "Model exported to: local-training-output/export/20201123064458\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2020-11-23 06:44:58.444851: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64\n", "2020-11-23 06:44:58.445042: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)\n", "2020-11-23 06:44:58.445110: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (tensorflow-2-3-20201123-150230): /proc/driver/nvidia/version does not exist\n", "2020-11-23 06:44:58.445498: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA\n", "2020-11-23 06:44:58.454896: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2200195000 Hz\n", "2020-11-23 06:44:58.455422: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fdd58000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:\n", "2020-11-23 06:44:58.455465: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version\n", "2020-11-23 06:44:59.520836: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session started.\n", "2020-11-23 06:45:01.592319: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session started.\n", "2020-11-23 06:45:01.605299: I tensorflow/core/profiler/rpc/client/save_profile.cc:168] Creating directory: local-training-output/tensorboard/20201123064458/train/plugins/profile/2020_11_23_06_45_01\n", "2020-11-23 06:45:01.608469: I tensorflow/core/profiler/rpc/client/save_profile.cc:174] Dumped gzipped tool data for trace.json.gz to local-training-output/tensorboard/20201123064458/train/plugins/profile/2020_11_23_06_45_01/tensorflow-2-3-20201123-150230.trace.json.gz\n", "2020-11-23 06:45:01.610257: I tensorflow/core/profiler/utils/event_span.cc:288] Generation of step-events took 0.022 ms\n", "\n", "2020-11-23 06:45:01.612278: I tensorflow/python/profiler/internal/profiler_wrapper.cc:87] Creating directory: local-training-output/tensorboard/20201123064458/train/plugins/profile/2020_11_23_06_45_01Dumped tool data for overview_page.pb to local-training-output/tensorboard/20201123064458/train/plugins/profile/2020_11_23_06_45_01/tensorflow-2-3-20201123-150230.overview_page.pb\n", "Dumped tool data for input_pipeline.pb to local-training-output/tensorboard/20201123064458/train/plugins/profile/2020_11_23_06_45_01/tensorflow-2-3-20201123-150230.input_pipeline.pb\n", "Dumped tool data for tensorflow_stats.pb to local-training-output/tensorboard/20201123064458/train/plugins/profile/2020_11_23_06_45_01/tensorflow-2-3-20201123-150230.tensorflow_stats.pb\n", "Dumped tool data for kernel_stats.pb to local-training-output/tensorboard/20201123064458/train/plugins/profile/2020_11_23_06_45_01/tensorflow-2-3-20201123-150230.kernel_stats.pb\n", "\n", "2020-11-23 06:45:05.158985: W tensorflow/python/util/util.cc:329] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.\n", "WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "If using Keras pass *_constraint arguments to layers.\n", "INFO:tensorflow:Assets written to: local-training-output/export/20201123064458/assets\n" ] } ], "source": [ "%%bash\n", "cd $NOTEBOOK_DIR\n", "gcloud ai-platform local train \\\n", " --package-path trainer \\\n", " --module-name trainer.task \\\n", " --job-dir local-training-output \\\n", " -- \\\n", " --bucket $BUCKET \\\n", " --num-train-examples 10000 \\\n", " --num-eval-examples 1000 \\\n", " --num-evals 4 \\\n", " --learning-rate 0.001" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the code works in standalone mode, you can run it on Cloud AI Platform. Because this is on the entire dataset, it will take a while. The training run took about **60 minutes** for me. You can monitor the job from the GCP console in the AI Platform section." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "jobId: babyweight_201123_075703\n", "state: QUEUED\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Job [babyweight_201123_075703] submitted successfully.\n", "Your job is still active. You may view the status of your job with the command\n", "\n", " $ gcloud ai-platform jobs describe babyweight_201123_075703\n", "\n", "or continue streaming the logs with the command\n", "\n", " $ gcloud ai-platform jobs stream-logs babyweight_201123_075703\n" ] } ], "source": [ "%%bash\n", "JOB_NAME=babyweight_$(date -u +%y%m%d_%H%M%S)\n", "JOB_DIR=gs://${BUCKET}/babyweight/trained_model\n", "\n", "cd $NOTEBOOK_DIR\n", "gcloud ai-platform jobs submit training $JOB_NAME \\\n", " --job-dir $JOB_DIR \\\n", " --package-path trainer/ \\\n", " --module-name trainer.task \\\n", " --region $REGION \\\n", " --python-version 3.7 \\\n", " --runtime-version 2.2 \\\n", " --scale-tier basic-gpu \\\n", " -- \\\n", " --bucket $BUCKET \\\n", " --num-train-examples 60000000 \\\n", " --num-eval-examples 50000 \\\n", " --num-evals 100 \\\n", " --learning-rate 0.0001" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Training logs are stored in the following location of the GCS bucket." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gs://babyweight-keras-ml/babyweight/trained_model/tensorboard/20201123080216/\n" ] } ], "source": [ "%%bash\n", "gsutil ls gs://${BUCKET}/babyweight/trained_model/tensorboard/ | tail -1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can launch the TensorBoard with the followind command on the Cloud Shell. Use the web preview with port 6006 to access it.\n", "\n", "```\n", "tensorboard --logdir [GCS path]\n", "```\n", "\n", "The following image is an example graph for MSE. The final MSE for the eval data is about 1.6, that is, RMSE=1.2lbs" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "execution_count": 48, "metadata": { "image/png": { "width": 600 } }, "output_type": "execute_result" } ], "source": [ "from IPython.display import Image\n", "Image(filename='{}/tensorboard.png'.format(NOTEBOOK_DIR), width=600)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploying the trained model\n", "\n", "Deploying the trained model to act as a REST web service is a simple gcloud call." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gs://babyweight-keras-ml/babyweight/trained_model/export/20201123080216/\n" ] } ], "source": [ "%%bash\n", "gsutil ls gs://${BUCKET}/babyweight/trained_model/export/ | tail -1" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Deploying babyweight v1 from gs://babyweight-keras-ml/babyweight/trained_model/export/20201123080216/ ... this will take a few minutes\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Using endpoint [https://ml.googleapis.com/]\n", "Created ml engine model [projects/babyweight-keras/models/babyweight].\n", "Using endpoint [https://ml.googleapis.com/]\n", "Creating version (this might take a few minutes)......\ndone.\n" ] } ], "source": [ "%%bash\n", "MODEL_NAME=\"babyweight\"\n", "MODEL_VERSION=\"v1\"\n", "MODEL_LOCATION=$(gsutil ls gs://${BUCKET}/babyweight/trained_model/export/ | tail -1)\n", "echo \"Deploying $MODEL_NAME $MODEL_VERSION from $MODEL_LOCATION ... this will take a few minutes\"\n", "#gcloud ml-engine versions delete ${MODEL_VERSION} --model ${MODEL_NAME}\n", "#gcloud ml-engine models delete ${MODEL_NAME}\n", "gcloud ai-platform models create ${MODEL_NAME} --regions $REGION\n", "gcloud ai-platform versions create ${MODEL_VERSION} --model ${MODEL_NAME} \\\n", " --region=global --origin ${MODEL_LOCATION} --runtime-version 2.2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the model to predict\n", "\n", "Send a JSON request to the endpoint of the service to make it predict a baby's weight ... I am going to try out how well the model would have predicted the weights of our two kids and a couple of variations while we are at it ..." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"predictions\": [\n", " {\n", " \"weight\": [\n", " 7.9085187911987305\n", " ]\n", " },\n", " {\n", " \"weight\": [\n", " 7.492921829223633\n", " ]\n", " },\n", " {\n", " \"weight\": [\n", " 6.517521858215332\n", " ]\n", " },\n", " {\n", " \"weight\": [\n", " 6.440060615539551\n", " ]\n", " }\n", " ]\n", "}\n" ] } ], "source": [ "from googleapiclient import discovery\n", "from oauth2client.client import GoogleCredentials\n", "import json\n", "\n", "credentials = GoogleCredentials.get_application_default()\n", "api = discovery.build('ml', 'v1', credentials=credentials, cache_discovery=False)\n", "\n", "request_data = {'instances':\n", " [\n", " {\n", " 'is_male': 'True',\n", " 'mother_age': 26.0,\n", " 'plurality': 'Single(1)',\n", " 'gestation_weeks': 39\n", " },\n", " {\n", " 'is_male': 'False',\n", " 'mother_age': 29.0,\n", " 'plurality': 'Single(1)',\n", " 'gestation_weeks': 38\n", " },\n", " {\n", " 'is_male': 'True',\n", " 'mother_age': 26.0,\n", " 'plurality': 'Triplets(3)',\n", " 'gestation_weeks': 39\n", " },\n", " {\n", " 'is_male': 'Unknown',\n", " 'mother_age': 29.0,\n", " 'plurality': 'Multiple(2+)',\n", " 'gestation_weeks': 38\n", " },\n", " ]\n", "}\n", "\n", "parent = 'projects/%s/models/%s/versions/%s' % (PROJECT, 'babyweight', 'v1')\n", "response = api.projects().predict(body=request_data, name=parent).execute()\n", "print(json.dumps(response, sort_keys = True, indent = 4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When I ran this, the four predictions for each of the requests in request_data above are 7.9, 7.5, 6.5, and 6.4 pounds. Yours may be different." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License" ] } ], "metadata": { "environment": { "name": "tf2-2-3-gpu.2-3.m59", "type": "gcloud", "uri": "gcr.io/deeplearning-platform-release/tf2-2-3-gpu.2-3:m59" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }