{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Diabetes prediction using synthesized health records\n", "\n", "This notebook explores how to train a machine learning model to predict type 2 diabetes using synthesized patient health records. The use of synthesized data allows us to learn about building a model without any concern about the privacy issues surrounding the use of real patient health records.\n", "\n", "## Prerequisites\n", "\n", "This project is part of a series of code patterns pertaining to a fictional health care company called Example Health. This company stores electronic health records in a database on a z/OS server. Before running the notebook, the synthesized health records must be created and loaded into this database. Another project, https://github.com/IBM/example-health-synthea, provides the steps for doing this. The records are created using a tool called Synthea (https://github.com/synthetichealth/synthea), transformed and loaded into the database." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load and prepare the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set up the information needed for a JDBC connection to your database below\n", "The database must be set up by following the instructions in https://github.com/IBM/example-health-synthea." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "credentials_1 = {\n", " 'host':'xxx.yyy.com',\n", " 'port':'nnnn',\n", " 'username':'user',\n", " 'password':'password',\n", " 'database':'location',\n", " 'schema':'SMHEALTH'\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define a function to load data from a database table into a Spark dataframe\n", "\n", "The partitionColumn, lowerBound, upperBound, and numPartitions options are used to load the data more quickly\n", "using multiple JDBC connections. The data is partitioned by patient id. It is assumed that there are approximately\n", "5000 patients in the database. If there are more or less patients, adjust the upperBound value appropriately." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def load_data_from_database(table_name):\n", " return (\n", " spark.read.format(\"jdbc\").options(\n", " driver = \"com.ibm.db2.jcc.DB2Driver\",\n", " url = \"jdbc:db2://\" + credentials_1[\"host\"] + \":\" + credentials_1[\"port\"] + \"/\" + credentials_1[\"database\"],\n", " user = credentials_1[\"username\"], \n", " password = credentials_1[\"password\"], \n", " dbtable = credentials_1[\"schema\"] + \".\" + table_name,\n", " partitionColumn = \"patientid\",\n", " lowerBound = 1,\n", " upperBound = 5000,\n", " numPartitions = 10\n", " ).load()\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read patient observations from the database\n", "\n", "The observations include things like blood pressure and cholesterol readings which are potential features for our model." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------+-----------------+--------+--------------------+------------+--------------+-------+\n", "|PATIENTID|DATEOFOBSERVATION| CODE| DESCRIPTION|NUMERICVALUE|CHARACTERVALUE| UNITS|\n", "+---------+-----------------+--------+--------------------+------------+--------------+-------+\n", "| 222| 2019-01-26|8302-2 | Body Height| 49.00| | cm|\n", "| 222| 2019-01-26|72514-3 |Pain severity - 0...| 1.70| |{score}|\n", "| 222| 2019-01-26|29463-7 | Body Weight| 4.50| | kg|\n", "| 222| 2019-01-26|6690-2 |Leukocytes [#/vol...| 5.10| |10*3/uL|\n", "| 222| 2019-01-26|789-8 |Erythrocytes [#/v...| 5.10| |10*6/uL|\n", "+---------+-----------------+--------+--------------------+------------+--------------+-------+\n", "only showing top 5 rows\n", "\n" ] } ], "source": [ "observations_df = load_data_from_database(\"OBSERVATIONS\")\n", "\n", "observations_df.show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The observations table has a generalized format with a separate row per observation\n", "\n", "Let's collect the observations that may be of interest in making a diabetes prediction.\n", "First, select systolic blood pressure readings from the observations. These have code 8480-6." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------+-----------------+--------+\n", "|patientid|dateofobservation|systolic|\n", "+---------+-----------------+--------+\n", "| 222| 2019-03-02| 101.30|\n", "| 72| 2009-05-16| 122.70|\n", "| 72| 2010-05-22| 129.10|\n", "| 72| 2011-05-28| 109.00|\n", "| 72| 2012-06-02| 135.40|\n", "+---------+-----------------+--------+\n", "only showing top 5 rows\n", "\n" ] } ], "source": [ "from pyspark.sql.functions import col\n", "\n", "systolic_observations_df = (\n", " observations_df.select(\"patientid\", \"dateofobservation\", \"numericvalue\")\n", " .withColumnRenamed(\"numericvalue\", \"systolic\")\n", " .filter((col(\"code\") == \"8480-6\"))\n", " )\n", "\n", "\n", "systolic_observations_df.show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Select other observations of potential interest\n", "\n", "* Select diastolic blood pressure readings (code 8462-4).\n", "* Select HDL cholesterol readings (code 2085-9).\n", "* Select LDL cholesterol readings (code 18262-6).\n", "* Select BMI (body mass index) readings (code 39156-5)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "diastolic_observations_df = (\n", " observations_df.select(\"patientid\", \"dateofobservation\", \"numericvalue\")\n", " .withColumnRenamed('numericvalue', 'diastolic')\n", " .filter((col(\"code\") == \"8462-4\"))\n", " )\n", "\n", "hdl_observations_df = (\n", " observations_df.select(\"patientid\", \"dateofobservation\", \"numericvalue\")\n", " .withColumnRenamed('numericvalue', 'hdl')\n", " .filter((col(\"code\") == \"2085-9\"))\n", " )\n", "\n", "ldl_observations_df = (\n", " observations_df.select(\"patientid\", \"dateofobservation\", \"numericvalue\")\n", " .withColumnRenamed('numericvalue', 'ldl')\n", " .filter((col(\"code\") == \"18262-6\"))\n", " )\n", "\n", "bmi_observations_df = (\n", " observations_df.select(\"patientid\", \"dateofobservation\", \"numericvalue\")\n", " .withColumnRenamed('numericvalue', 'bmi')\n", " .filter((col(\"code\") == \"39156-5\"))\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Join the observations for each patient by date into one dataframe" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------+-----------------+--------+---------+-----+------+-----+\n", "|patientid|dateofobservation|systolic|diastolic| hdl| ldl| bmi|\n", "+---------+-----------------+--------+---------+-----+------+-----+\n", "| 4| 2011-12-17| 105.10| 77.10|71.00| 86.50|57.70|\n", "| 157| 2014-07-16| 138.00| 83.70|21.10|181.40|37.90|\n", "| 230| 2010-04-23| 164.70| 117.90|26.20|147.90|35.20|\n", "| 244| 2015-04-01| 119.00| 84.30|77.60| 96.20|25.50|\n", "| 290| 2018-08-21| 130.60| 70.90|73.90| 77.80|47.10|\n", "+---------+-----------------+--------+---------+-----+------+-----+\n", "only showing top 5 rows\n", "\n" ] } ], "source": [ "merged_observations_df = (\n", " systolic_observations_df.join(diastolic_observations_df, [\"patientid\", \"dateofobservation\"])\n", " .join(hdl_observations_df, [\"patientid\", \"dateofobservation\"])\n", " .join(ldl_observations_df, [\"patientid\", \"dateofobservation\"])\n", " .join(bmi_observations_df, [\"patientid\", \"dateofobservation\"])\n", ")\n", "\n", "merged_observations_df.show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Another possible feature is the patient's age at the time of observation\n", "\n", "Load the patients' birth dates from the database into a dataframe." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------+-----------+\n", "|patientid|dateofbirth|\n", "+---------+-----------+\n", "| 1| 2017-07-04|\n", "| 2| 1965-04-14|\n", "| 3| 1996-09-14|\n", "| 4| 1958-11-29|\n", "| 5| 1979-01-28|\n", "+---------+-----------+\n", "only showing top 5 rows\n", "\n" ] } ], "source": [ "patients_df = load_data_from_database(\"PATIENT\").select(\"patientid\", \"dateofbirth\")\n", "\n", "patients_df.show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add a column containing the patient's age to the merged observations." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------+-----------------+--------+---------+-----+-----+-----+-----------------+\n", "|patientid|dateofobservation|systolic|diastolic| hdl| ldl| bmi| age|\n", "+---------+-----------------+--------+---------+-----+-----+-----+-----------------+\n", "| 463| 2016-02-13| 136.90| 81.10|66.60|76.20|35.80|55.57808219178082|\n", "| 463| 2013-01-26| 113.40| 77.50|77.30|91.40|35.80|52.52876712328767|\n", "| 463| 2019-03-02| 123.60| 71.60|73.80|95.50|35.80|58.62739726027397|\n", "| 463| 2010-01-09| 113.50| 70.60|71.20|76.00|35.80|49.47945205479452|\n", "| 471| 2017-07-12| 155.60| 99.00|59.00|83.70|38.30|35.19178082191781|\n", "+---------+-----------------+--------+---------+-----+-----+-----+-----------------+\n", "only showing top 5 rows\n", "\n" ] } ], "source": [ "from pyspark.sql.functions import datediff\n", "\n", "merged_observations_with_age_df = (\n", " merged_observations_df.join(patients_df, \"patientid\")\n", " .withColumn(\"age\", datediff(col(\"dateofobservation\"), col(\"dateofbirth\"))/365)\n", " .drop(\"dateofbirth\")\n", " )\n", "\n", "merged_observations_with_age_df.show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find the patients that have been diagnosed with type 2 diabetes\n", "\n", "The conditions table contains the conditions that patients have and the date they were diagnosed.\n", "Load the patient conditions table and select the patients that have been diagnosed with type 2 diabetes.\n", "Keep the date they were diagnosed (\"start\" column)." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------+----------+\n", "|patientid| start|\n", "+---------+----------+\n", "| 66|2003-06-28|\n", "| 281|2012-07-20|\n", "| 230|2008-04-18|\n", "| 157|1994-12-28|\n", "| 251|2011-02-11|\n", "+---------+----------+\n", "only showing top 5 rows\n", "\n" ] } ], "source": [ "diabetics_df = (\n", " load_data_from_database(\"CONDITIONS\")\n", " .select(\"patientid\", \"start\")\n", " .filter(col(\"description\") == \"Diabetes\")\n", ")\n", "\n", "diabetics_df.show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a \"diabetic\" column which is the \"label\" for the model to predict\n", "\n", "Join the merged observations with the diabetic patients.\n", "This is a left join so that we keep all observations for both diabetic and non-diabetic patients.\n", "Create a new column with a binary value, 1=diabetic, 0=non-diabetic.\n", "This will be the label for the model (the value it is trying to predict)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------+-----------------+--------+---------+-----+-----+-----+-----------------+-----+--------+\n", "|patientid|dateofobservation|systolic|diastolic| hdl| ldl| bmi| age|start|diabetic|\n", "+---------+-----------------+--------+---------+-----+-----+-----+-----------------+-----+--------+\n", "| 463| 2013-01-26| 113.40| 77.50|77.30|91.40|35.80|52.52876712328767| null| 0|\n", "| 463| 2010-01-09| 113.50| 70.60|71.20|76.00|35.80|49.47945205479452| null| 0|\n", "| 463| 2016-02-13| 136.90| 81.10|66.60|76.20|35.80|55.57808219178082| null| 0|\n", "| 463| 2019-03-02| 123.60| 71.60|73.80|95.50|35.80|58.62739726027397| null| 0|\n", "| 471| 2017-07-12| 155.60| 99.00|59.00|83.70|38.30|35.19178082191781| null| 0|\n", "+---------+-----------------+--------+---------+-----+-----+-----+-----------------+-----+--------+\n", "only showing top 5 rows\n", "\n" ] } ], "source": [ "from pyspark.sql.functions import when\n", "\n", "observations_and_condition_df = (\n", " merged_observations_with_age_df.join(diabetics_df, \"patientid\", \"left_outer\")\n", " .withColumn(\"diabetic\", when(col(\"start\").isNotNull(), 1).otherwise(0))\n", ")\n", "\n", "observations_and_condition_df.show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Filter the observations for diabetics to remove those taken before diagnosis\n", "\n", "This is driven by the way that the diabetes simulation works in Synthea. The impact of the condition (diabetes) is not reflected in the observations until the patient is diagnosed with the condition in a wellness visit. Prior to that the patient's observations won't be any different from a non-diabetic patient. Therefore we want only the observations at the time the patients were diabetic." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "observations_and_condition_df = (\n", " observations_and_condition_df.filter((col(\"diabetic\") == 0) | ((col(\"dateofobservation\") >= col(\"start\"))))\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reduce the observations to a single observation per patient (the earliest available observation)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "from pyspark.sql.window import Window\n", "from pyspark.sql.functions import row_number\n", "\n", "w = Window.partitionBy(observations_and_condition_df[\"patientid\"]).orderBy(merged_observations_df[\"dateofobservation\"].asc())\n", "\n", "first_observation_df = observations_and_condition_df.withColumn(\"rn\", row_number().over(w)).where(col(\"rn\") == 1).drop(\"rn\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualize data\n", "\n", "At this point we have collected some observations which might be relevant to making a diabetes prediction. The next step is to look for relationships between those observations and having diabetes. There are many tools that help visualize data to look for relationships. One of the easiest ones to use is called Pixiedust (https://github.com/pixiedust/pixiedust).\n", "\n", "Install the pixiedust visualization tool." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# !pip install --upgrade pixiedust" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use Pixiedust to visualize whether observations correlate with diabetes\n", "\n", "The PixieDust interactive widget appears when you run this cell.\n", "* Click the chart button and choose Scatter Plot.\n", "* Click the chart options button. Drag \"ldl\" into the Keys box and drag \"hdl\" into the Values box.\n", "Set the # of Rows to Display to 5000. Click OK to close the chart options.\n", "* Select bokeh from the Renderer dropdown menu.\n", "* Select diabetic from the Color dropdown menu.\n", "\n", "The scatter plot chart appears.\n", "\n", "Click Options and try replacing \"ldl\" and \"hdl\" with other attributes." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "pixiedust": { "displayParams": { "chartsize": "100", "color": "diabetic", "handlerId": "scatterPlot", "keyFields": "ldl", "rendererId": "bokeh", "rowCount": "1000", "valueFields": "hdl" } } }, "outputs": [ { "data": { "text/html": [ "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
\n", "
\n", " \n", "
\n", "
\n", " \n", " \n", "
\n", "
\n", "
Pan
Box Zoom
Wheel Zoom
Save
Reset
Hover
Click the question mark to learn more about Bokeh plot tools.
\n", "
\n", " \n", " \n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pixiedust\n", "\n", "display(first_observation_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build and train the model\n", "\n", "The visualization of the data showed that the strongest predictors of diabetes are the cholesterol observations. This is an artifact of the diabetes simulation used to create the synthesized data. The simulation uses a distinct range of HDL readings for diabetic vs. non-diabetic patients.\n", "\n", "The simulation increases the chance of high blood pressure (hypertension) for diabetics but the non-diabetic patients also can have high blood pressure. Therefore the correlation of high blood pressure to diabetes isn't very strong.\n", "\n", "The simulation does not change the weight of any diabetic patients so BMI has no correlation.\n", "\n", "Let's continue using HDL and systolic blood pressure as the features for the model. In reality more features would be needed to build a usable model.\n", "\n", "Create a pipeline that assembles the feature columns and runs a logistic regression algorithm. Then use the observation data to train the model." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "from pyspark.ml.feature import VectorAssembler\n", "from pyspark.ml.classification import LogisticRegression\n", "from pyspark.ml import Pipeline\n", "\n", "vectorAssembler_features = VectorAssembler(inputCols=[\"hdl\", \"systolic\"], outputCol=\"features\")\n", "\n", "lr = LogisticRegression(featuresCol = 'features', labelCol = 'diabetic', maxIter=10)\n", "\n", "pipeline = Pipeline(stages=[vectorAssembler_features, lr])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Split the observation data into two portions\n", "\n", "The larger portion (80% of the data) is used to train the model.\n", "The smaller portion (20% of the data) is used to test the model." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "split_data = first_observation_df.randomSplit([0.8, 0.2], 24)\n", "train_data = split_data[0]\n", "test_data = split_data[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train the model" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "model = pipeline.fit(train_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate the model\n", "\n", "One way to evaluate the model is to plot a precision/recall curve.\n", "\n", "Precision measures the percentage of the predicted true outcomes that are actually true.\n", "\n", "Recall measures the percentage of the actual true conditions that are predicted as true.\n", "\n", "Ideally we want both precision and recall to be 100%.\n", "We want all of the diabetes predictions to actually have diabetes (precision = 1.0).\n", "We want all of the actual diabetics to be predicted to be diabetic (recall = 1.0).\n", "\n", "The model computes the probability of a true condition and then compares that to a threshold\n", "(by default 0.5) to make a final true of false determination. The precision/recall curve plots\n", "precision and recall at various threhold values." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3Xl8VPW9//HXJ/tCEghJWAMRZBFRUALugnUpWiu1WrdatbXVbtrWaq/t7f3Va2+XW9vaq9ZWXKr1ttpqby0qlqpVQQUkFEFAdgKENWELJGT//P6YIU0hkAEycyaZ9/PxmEdmOTN5HxLmnfM9Z77H3B0RERGApKADiIhI/FApiIhIK5WCiIi0UimIiEgrlYKIiLRSKYiISCuVgoiItFIpiIhIK5WCiIi0Sgk6wJEqKCjwkpKSoGOIiHQp8+fPr3L3wo6W63KlUFJSQllZWdAxRES6FDNbF8lyGj4SEZFWKgUREWmlUhARkVYqBRERaaVSEBGRVlErBTN7wsy2mdniQzxuZvaAma0ys0Vmdmq0soiISGSiuaXwJDD5MI9fDAwLX24BfhXFLCIiEoGofU7B3WeaWclhFpkC/NZD5wOdY2Y9zayfu2+ORp555TuYtaLyiJ+XnprMNeOL6d0jPQqpRETiS5AfXhsAbGhzuyJ830GlYGa3ENqaYNCgQUf1zf6xbicPvrHqiJ/nDk+9W84vrh7LmccXHNX3FhHpKoIsBWvnPm9vQXefCkwFKC0tbXeZjtw6cSi3Thx6xM9buqma2575B59+fC5fmjiUb1w4nNRk7Z8Xke4pyHe3CqC4ze2BwKaAshzSqP65vHjb2VxdWszDb67mqkdms2FHbdCxRESiIshSmAbcED4K6XRgd7T2JxyrrLQUfnzFyTx47Sms2rqXS/5nFi8ujLv+EhE5ZlEbPjKzZ4BJQIGZVQDfA1IB3P3XwHTgEmAVUAt8NlpZOsvHx/RnbHFPbn92Abc9s4C3V1bxvctGkZUWm1G4puYWtu6pZ9OufWzatY+N4a+1Dc185vTBnDKoV0xyiEj3ZaGDf7qO0tJSD3qW1MbmFn7x2goefnM1xxVk8+C1p3Bi/7xjft3qusY2b/h1rddDlzq2VNfR3PKvP69eWak0tzjVdU1cenI/vvXRkQzqnXXMWUSkezGz+e5e2uFyKoWj9+6qKr7+h/fZVdvIdy4ZyY1nlmDW3v7zgzW3OB9urmZe+Q7Kyncyr3wH2/bU/8syqclGv7xM+vfMoH/PTAb0zKR/+BK6nkFWWgp765uY+tZqps5aQ3OLc8MZJdz2kePpmZUWjdWOipYWZ09dEztrG9hZ28Cu2kZqG5oZN7gXffMygo4n0uWpFGJk+9567np+EX9fto0LTijiJ1eOIT/74DfjfQ3NvL9hF/PKdzCvfAcL1u9ib30TAAN6ZlJa0otR/XIZ0Oufb/qFPdJJSoqsZAC27K7j568u57n5FeSkp3DbR4Zxw5mDSU9J7rT1jVRNfRPb9tSztbqO7Xv3v9E3sLO2kV21jeHroTf/nbUN7N7XSMshfhVPGdSTi0f35eLR/SjO11aQyNFQKcSQu/Pku+X8aPoyemWncv/VYxnZN5ey8h2UrdvJe2t3sHjjbppaHDMY0SeH0pJejC/Jp7QknwE9Mzs1z7It1fxo+jLeWlFJcX4md310JB8/uV/EWzGH4h4apqrcU8e26nq2hr9u2xO+VNe1fq1paG73NbLSkumVlUbPrFR6ZaWRl5VKr/D1nllp9MxMpVd2Kj2z0khJMmauqOSVxVtYsqkagBP753Lx6L5MHt2P44t6HNP6iCQSlUIAlmzazW3PLGBNZU3rfWnJSYwpzqO0JJ/xJb0YNyifvKzUmOSZtbKSH05fxoebqxkzMI/vXHICpw3pfdjn1DU2s35HLeVVNazfUcu67bWUb69hw45atlTXUdfYctBzMlOT6ZObTlFOBoW56fTJyaAoN52inHT65GZQ0COdXlmp5GWlHvVWy/rttfx1yWZeWbyFBet3AXB8UY9wQfRlVL/cYy49ke5MpRCQ2oYmHp25ltQUY3xJPicNyCMjNfbDN/s1tzh/XrCRn85YzpbqOi4c1YevXzCMlhYo317TWgDrdtSyfnvojb+tnIwUSnpnM6h3Fv1yM+iTG3rDLwy/4RflpNMjPSWmb8hbdtcxY8kWXlm8mffW7qDFYUhhNvdeNpqzh+lT5yLtUSnIv9jX0MwT76zlV2+ubt2XsV9hTjolvbMYlJ/N4N5Z4Us2g/Oz6JmVGtd/gVftrefVpVt5dNYa1lTW8JnTB/PtS0bG7DBhka5CpSDtqtpbz4wlW+idnU5JQRaD8rO6xRtoXWMz981YzhPvrGVQfhY//dQYxpfkBx1LJG6oFCQhzV2znTufX0jFzn184Zwh3HHh8ECH70TiRaSloJndpFs5bUhv/vq1c7luwiCmzlzDpQ++zaKKXUHHEukyVArS7WSnp/CDy0/iqc9NYG9dE5c//C4/f3UFDU0HHzklIv9KpSDd1sThhcz4xrlMGdufB15fyeUPv8PKrXuCjiUS11QK0q3lZaby86vG8shnxrFldx23/u98utp+NJFYUilIQvjoiX25++KRrKms4b21O4KOIxK3VAqSMD52cj9y0lN4dt6GjhcWSVAqBUkYWWkpTDmlP9M/2Mzu2sag44jEJZWCJJRrxg+ivqmFF97fGHQUkbikUpCEMnpAHqMH5PLMe+u1w1mkHSoFSTjXjB/Esi17WFixO+goInFHpSAJ57Kx/clMTeYP89YHHUUk7qgUJOHkZqTysZP7Me39TdQcMGOsSKJTKUhCunZCMTUNzby4cFPQUUTiikpBEtKpg3oxrKiHPrMgcgCVgiQkM+Pq8cW8v2EXy7ZUBx1HJG6oFCRhXXHqQFKTjRcWaAhJZD+VgiSsXtlpDOiZycZd+4KOIhI3VAqS0Ap6pFO1pz7oGCJxQ6UgCa13jzSq9qoURPZTKUhCK+iRzvaahqBjiMQNlYIktIIe6eysbaCpWafqFAGVgiS4gh5puMMObS2IACoFSXAFPdIBqNqrUhABlYIkuIKc/aWgnc0ioFKQBNc7Ow1QKYjsp1KQhLZ/S2G7ho9EAJWCJLic9BTSUpK0pSASFtVSMLPJZrbczFaZ2d3tPD7IzN4wswVmtsjMLolmHpEDmRkF2WlUqhREgCiWgpklA78ELgZGAdea2agDFvsu8Ed3PwW4Bng4WnlEDqUgJ11HH4mERXNLYQKwyt3XuHsD8Cww5YBlHMgNX88DNF2lxNyIPjnMXl3Fu6urgo4iErholsIAoO0ZTCrC97V1D3C9mVUA04HbophHpF3fvXQUJb2zufXp+SzfsifoOCKBimYpWDv3+QG3rwWedPeBwCXA02Z2UCYzu8XMysysrLKyMgpRJZHlZaby5OcmkJmazGd/8x5bq+uCjiQSmGiWQgVQ3Ob2QA4eHroZ+COAu88GMoCCA1/I3ae6e6m7lxYWFkYpriSyAT0zeeKm8eze18hNv5nHnrrGoCOJBCKapTAPGGZmx5lZGqEdydMOWGY9cD6AmZ1AqBS0KSCBGD0gj4evH8eKrXv48u/+QaMmyZMEFLVScPcm4KvADOBDQkcZLTGze83ssvBi3wS+YGYLgWeAm9z9wCEmkZiZOLyQH11+ErNWVvGd//sA/TpKokmJ5ou7+3RCO5Db3vf/2lxfCpwVzQwiR+qq8cVU7NrHA6+vZFB+FredPyzoSCIxo080i7TjGxcM4xNj+3P/aytYsH5n0HFEYkalINIOM+PeT4ymT24Gdz63kLrG5qAjicSESkHkEHIzUvnxFSezurKG+19bEXQckZhQKYgcxsThhVwzvphHZ67hHxpGkgSgUhDpwL9/7AT65mZwl4aRJAGoFEQ6kNN2GOlVDSNJ96ZSEInAucMLuXbCIB6dtYbXlm7V5xek24rq5xREupPvXDKSd1dX8fnfljGkIJsrSwdyxakD6ZObEXQ0kU5jXe0vntLSUi8rKws6hiSomvompn+wmefKKnivfAdJBpNGFHFV6UA+MrIPaSna+Jb4ZGbz3b20w+VUCiJHZ21VDc/P38Dz8yvYWl1PfnYak0f35cyhvTntuN4Uhs//LBIPVAoiMdLc4sxcWclzZRt4a3klNQ2hI5SGFmZz+pDenDakN6cfl0+RhpkkQCoFkQA0NbeweFM1c9ZsZ+6a7cwr38ne+iYAhhRkc9bxBdx98Uiy07U7T2Ir0lLQb6ZIJ0pJTmJscU/GFvfkixOH0tTcwpJN1cxdu503l1fy9Jx1TBxeyAWj+gQdVaRd2ismEkUpyUmMKe7JLecO5aHrTgWgfHtNwKlEDk2lIBIjvbJSyclIYd322qCjiBySSkEkRsyMkt7Z2lKQuKZSEImhwb2ztKUgcU2lIBJDJb2zqdhZS0OTzv8s8UmlIBJDg3tn0eKwcde+oKOItEulIBJDJQXZgI5AkvilUhCJoZLeoVJ4vqxC52aQuKRSEImhwpx0vnb+MF7+YDNX/vpdNuzQTmeJLyoFkRj7xoXDeeyGUtZtr+XSB9/m78u2Bh1JpJVKQSQAF4zqw8u3ncPAXpl87skyfjpjOS0tXWseMumeVAoiARnUO4s/felMri4t5qE3VvHwm6uCjiSiCfFEgpSRmsyPrziJ2sZm7n9tJWcMLWDc4F5Bx5IEpi0FkYCZGT+4fDT98jL42rMLqK5rDDqSJDCVgkgcyM1I5YFrT2Hz7jq+++fFdLXznEj3oVIQiROnDurFNy4YxrSFm/jTPzYGHUcSVMSlYGYDzOxMMzt3/yWawUQS0ZcmHc+E4/L5/ktLqW1oCjqOJKCISsHM/ht4B/gucFf4cmcUc4kkpOQk498mj2D3vkb+NL8i6DiSgCI9+ugTwAh3r49mGBEJDSONKe7JE++U8+nTBpOUZEFHkgQS6fDRGiA1mkFEJMTMuPns41hbVcPfl20LOo4kmEi3FGqB983sdaB1a8Hdb49KKpEEd/HovvTPy+Cxt9dwwag+QceRBBJpKUwLX0QkBlKTk7jxzBJ+9Moylm2pZmTf3KAjSYKIaPjI3Z8CngHmhy+/D993WGY22cyWm9kqM7v7EMtcZWZLzWyJmf3+SMKLdGefOGUAAG+vrAo4iSSSiLYUzGwS8BRQDhhQbGY3uvvMwzwnGfglcCFQAcwzs2nuvrTNMsOAbwNnuftOMys62hUR6W765GYwsFcm/1i/M+gokkAiHT76GXCRuy8HMLPhhLYcxh3mOROAVe6+JvycZ4EpwNI2y3wB+KW77wRwd+1VE2mjdHAv3l29HXfHTEchSfRFevRR6v5CAHD3FXR8NNIAYEOb2xXh+9oaDgw3s3fMbI6ZTY4wj0hCGDe4F9v21FOxU+d0ltiItBTKzOxxM5sUvjxKaN/C4bT3Z82BE7qkAMOAScC1wGNm1vOgFzK7xczKzKyssrIywsgiXd+4wfkATP9gs+ZDkpiItBS+BCwBbge+RmgI6IsdPKcCKG5zeyCwqZ1l/uLuje6+FlhOqCT+hbtPdfdSdy8tLCyMMLJI1zeibw4nD8zjR68s49OPzWXJpt1BR5JuzqL114eZpQArgPOBjcA84Dp3X9JmmcnAte5+o5kVAAuAse6+/VCvW1pa6mVlZVHJLBKPGptb+P3c9dz/2gp272vkU+MGcudFIyjKzQg6mnQhZjbf3Us7Wu6wO5rN7I/ufpWZfcDBQz+4+8mHeq67N5nZV4EZQDLwhLsvMbN7gTJ3nxZ+7CIzWwo0A3cdrhBEEtH+zyx8YuwAHnpjJU++W86LCzdzzrACzhleyLnDChjcOzvomNJNHHZLwcz6uftmMxvc3uPuvi5qyQ5BWwqS6NZtr+GRmWt4a3klG3eFdkAPys/inGEFTBpRxPkjizRfkhwk0i2FiIaPzCwb2OfuLeHDUUcCr7h7zE8RpVIQCXF31lbVMGtlFbNWVjF7dRU1Dc08ffMEzhmmfW/yrzpl+KiNmcA5ZtYLeB0oA64GPn30EUXkWJgZQwp7MKSwBzeeWcIHFbv5+ENvs6+hOeho0oVFevSRuXst8EngQXe/HBgVvVgicqT02TbpDBGXgpmdQWjL4OXwfZFuZYiISBcRaSl8ndAcRX8OH0E0BHgjerFERCQIEf217+5vAW+1ub2G0AfZRCRO5GaEZp55anY5pSX55GenBRtIuqTDbimY2S/CX180s2kHXmITUUQiMah3Fj+8/CTmrd3Jxx6YpdlV5ah0tKXwdPjrT6MdRESO3XWnDeLkgXl86XfzuerXs/nixKGcNiSfkX1zKcxJDzqedAFH/DmF8O1kID18RFJM6XMKIh3bXdvIt/60kBlLtrbeV9AjjRF9cxjZN5cJx+Vz0ag+mo47gXT2h9fmABe4+97w7R7A39z9zGNOeoRUCiKR21HTwLIt1SzbvCf0dcselm/ZQ31TC2OLe/Ifl57QOhOrdG+d/eG1jP2FAODue80s66jTiUhM5GencebQAs4cWtB6X1NzC39esJH7Ziznil/N5mMn9+PbF49kYC/9l5bID0mtMbNT998ws3GAzvoh0gWlJCfxqdJi3rxrEl87fxivf7iVC38+k0feWk1jc0vQ8SRgkQ4fjQee5Z/nQ+gHXO3uHZ1op9Np+Eikc23ctY/v/WUJr324lZF9c/jhJ0/i1EG9go4lnaxT9ymEXzAVGEHojGrLgpgMD1QKItEyY8kW7pm2hM276/jUuIHcNXkERTk6Z0N3EWkpRDR8FN5/8G/A19z9A6DEzC49xowiEkc+emJfXr1jIrdOHMIL72/kvPve5Ndvraa+SRPsJZJI9yn8BmgAzgjfrgD+KyqJRCQwPdJT+PbFJ/C3b0zkjKEF/PiVZXzy4Xep2lsfdDSJkUhLYai7/wRoBHD3fYSGkUSkGzquIJvHbixl6mfGsbpyL1c9MptNu3RsSSKItBQazCyT8Ck5zWwooD8dRLq5i07sy9M3n0ZldT1X/updXlq0iZaW6JzXXeJDpKXwPeCvQLGZ/Y7QiXa+FbVUIhI3xpfk88wtp5OVnsJXf7+Ayf8zk5cXbVY5dFMdHn1koc/BDwRqgdMJDRvNcfeq6Mc7mI4+EglGc4vz8gebeeD1lazatpdLT+7HQ9ed2vETJS502tFHHmqNF9x9u7u/7O4vBVUIIhKc5CTjsjH9mfH1c7n57ON4adFm1lTu7fiJ0qVEOnw0J/wBNhFJcMlJxq0Th5CabPzvnPVBx5FOFuncR+cBXzSzcqCG0BCSu/vJ0QomIvGrKCeDyaP78dvZ5azfUcOUsQO44IQ+ZKYlBx1NjlGkpXBxVFOISJfzn5edSN/cdKYt3MRrH24jOy2ZL593PF+eNFRTcndhh93RbGYZwBeB44EPgMfdvSlG2dqlHc0i8aW5xZm7djtPvVvOjCVbmTK2P/99xclkpGqrIZ501tTZTxH6wNosQlsLo4CvHXs8EekukpOMM4cWcMaQ3jz85mrum7GcHTUNPH3zaUFHk6PQUSmMcveTAMzsceC96EcSka7IzPjKecdTuaeep2aXBx1HjlJHRx+1zoQa9LCRiHQNuRmR7qqUeNTRT2+MmVWHrxuQGb69/+ij3KimE5EuyT10hreU5EiPepd4cdifmLsnu3tu+JLj7iltrqsQROQgYwf1BOB/56wLOIkcDdW4iHSq80YUcfbxBfz81RVU1wVyLi45BioFEelUZsb1pw+iuq6J9dtrg44jR0ilICKdLjlJby1dlX5yItLpUpJDn2h+YcFGnc6zi1EpiEinO2toAVPG9uext9dy6QNvs3RTdcdPkrigUhCRTpeWksT/XHMKv7lpPLv3NXLL02Xs0U7nLiGqpWBmk81suZmtMrO7D7PclWbmZtbhvBwi0nWcN7KIX11/Kpt27eOOPy5k+16dxTfeRa0UzCwZ+CX/nDPpWjMb1c5yOcDtwNxoZRGR4IwbnM93LjmBvy/bxqSfvsnUmavZsKOWjs76KMGI5ufRJwCr3H0NgJk9C0wBlh6w3PeBnwB3RjGLiATo8+cMYdKIQr7/0of8cPoyfjh9Gf3yMigtyeeKUwcwaURR0BElLJrDRwOADW1uV4Tva2VmpwDF7v7S4V7IzG4xszIzK6usrOz8pCISdccX5fDU5ybw16+fw71TTmTc4F7MWbOdzz9VxrzyHUHHk7BolkJ7Z9lo3V40syTgfuCbHb2Qu09191J3Ly0sLOzEiCISayP75nLDGSU8dN2pvHbHRAb2yuT2ZxbQ0qLhpHgQzVKoAIrb3B4IbGpzOwcYDbwZPs3n6cA07WwWSRx5malcOW4gm3fX0aRSiAvRLIV5wDAzO87M0oBrgGn7H3T33e5e4O4l7l4CzAEuc3edVk0kgejUnfElaqUQPv/CV4EZwIfAH919iZnda2aXRev7iojI0Yvq2TDcfTow/YD7/t8hlp0UzSwiEt+219TTLy8z6BgJT59oFpFAnX9CEVlpyXzuyTLqGjVPUtBUCiISqJF9c7nnshP5cHM1SzRHUuBUCiISuL65GQDaUogDKgURCdzIvjnkZabyHy8s1vxIAVMpiEjginIzePzGUjbu2senH5urYgiQSkFE4kJpST6P3zietVU1XPfoXKpUDIFQKYhI3Dh7WAFP3DSedTtq+OYfFwYdJyGpFEQkrpx1fAEfGVlExc7aoKMkJJWCiMSdfnmZrKmq4fG31+qIpBhTKYhI3LnroyM4b0QR339pKWPv/Ruff2oeizfuDjpWQlApiEjcyUhN5tEbSvnt5yZwdWkx72/YxVWPzOa1pVuDjtbtqRREJC4lJxnnDi/kP6eMZvrt5zC0sAef/20ZU2euDjpat6ZSEJG4V5SbwXNfPINxg3vxzHsbOn6CHDWVgoh0CRmpyYzun8uGHbU8V6ZiiBaVgoh0GXdcNILThuRz1/OLuP2ZBTQ1twQdqdtRKYhIl5GXmcqTn53A588+jmkLNzF/3c6gI3U7KgUR6VJSk5O4anzo9O8/mbFc8yR1MpWCiHQ5w/vk8LNPjWH+up288P6moON0KyoFEemSLj6pL4D2K3QylYKIdGnb9tSzs6YBdw86SregUhCRLiklKYmM1CQef3stp3z/VS74+VvMXFEZdKwuLyXoACIiRyMtJYnXvzmJpZuqKa+q4Xdz13HDE+9x4ag+DCvq0bpccX4Wnzx1AOkpyQGm7Tqsq21ylZaWellZWdAxRCTO1DU28+jMNUydtaZ1ZlV3aGpxTuiXy7SvnkVqcuIOjpjZfHcv7XA5lYKIdFfuzv2vruCBv6/i1olD+MI5QyjokR50rEBEWgqJW5si0u2ZGZ8+fTAXjerD1JlrOOvHf+eNZduCjhXXVAoi0q31yc1g6g2l/O3r51Lf1MKiCp2X4XC0o1lEEsLQwh5kpCbxyMzVLKzYRZ/cdApzMijKSacwJ52inHSKcjPok5NOSgLve1ApiEhCSEoyXvjKWTw+ay2LN1WzqGIX22saOHC36pjinvzlK2cFEzIOqBREJGGM7JvLfZ8a03q7qbmF7TUNbKuuZ9ueOh6btZZFFbtYuqmaUf1zA0wanMTdRhKRhJeSnESf3AxOGpjH+Sf04arxA2lqcS59cBY7axqCjhcIlYKISNjlpwzknstOpMWhpqEp6DiB0PCRiEgb2emht8VLH3ybj4woYnjfHPIyUzlpQB6jB+QFnC76VAoiIm187KR+pCUbf1uyldeXbeP/FmwEoE9uOnO/c0HA6aJPpSAi0kZykjF5dD8mj+6Hu7OvsZnv/WUJr364NehoMaF9CiIih2BmZKWlkJWWTH1jCx8kwAffoloKZjbZzJab2Sozu7udx+8ws6VmtsjMXjezwdHMIyJyNEb2y2VfYzMff+htfvDyUmq78U7oqJWCmSUDvwQuBkYB15rZqAMWWwCUuvvJwPPAT6KVR0TkaF07YRDPf/EMBvfO4tFZa5m5oiroSFETzS2FCcAqd1/j7g3As8CUtgu4+xvuXhu+OQcYGMU8IiJHrbQkn8dvDE0yet+MZcwr3xFwouiIZikMADa0uV0Rvu9QbgZeae8BM7vFzMrMrKyyUmdWEpFgHFfQg7s+OoLVlTVc/chs3l3V/bYYolkK1s597Z68wcyuB0qB+9p73N2nunupu5cWFhZ2YkQRkcglJxlfOe94bj77OFocrntsLuf/7E1Wbt0TdLROE81SqACK29weCGw6cCEzuwD4d+Ayd6+PYh4RkU7xH5eO4qXbzmZoYTarK2u48P6ZXP3IbN5Yvo3FG3dTU991d0RH7cxrZpYCrADOBzYC84Dr3H1Jm2VOIbSDebK7r4zkdXXmNRGJF+7O3LU7+K+Xl7J4Y3Xr/Rec0IfHbuzwJGcxFRen4zSzS4BfAMnAE+7+AzO7Fyhz92lm9hpwErA5/JT17n7Z4V5TpSAi8Wj+uh3srGnk/tdW8OHmagb3zmZoYQ9G9s3hC+cMIS8rNdB8cVEK0aBSEJF4tnRTNX9dvJmV2/YyZ812dtY2kpmazIWj+pBkkGQGBvWNLZw2JJ8+uRmMHpDHgJ6ZUc0VaSlomgsRkU40qn9u67kYGptb+Nbzi/hwc+ikPg60uLO3romdtY28/EFokOSMIb155pbTA0z9TyoFEZEoSU1O4v6rx7b72KZd+9hZ28B/vLCYfY3NMU52aCoFEZEA9O+ZSf+emfTISKV6X2PQcVppQjwREWmlUhARCdiW3XU8+c5aZq2sJOiDf1QKIiIBGjswjx01Ddzz4lI+8/h7rKmqCTSPSkFEJEB3XDSC5f81mfuvHgNAXcA7nVUKIiIB238yH4A7n1vE8i3BzaWkUhARiQPjS/KZMrY/H26uZsH6nYHlUCmIiMSB/Ow07r54JHCI6aRjRKUgIhInkpNCZxz49v99wD3TllDfFPv9CyoFEZE4UZSTwU+uPBmAJ98t55bfzqelJbbbDSoFEZE4clVpMdNvP4cxA/N4a0UlJ35vBuVVNWzbUxeT769SEBGJM6P653Lfp8bQNzeDfY3NTPrpm0z4wevMWLIl6t9bcx+JiMSh4X1yeO2bE5m5opJt1XXc8+JSKvdE/+SUKgURkTjVIz2FS07qx+7aRt4r38Gg/Kyof0+VgohInMsQCzr8AAAFx0lEQVTLSuXhT4+LyffSPgUREWmlUhARkVYqBRERaaVSEBGRVioFERFppVIQEZFWKgUREWmlUhARkVYW9Emij5SZVQLrjvLpBUBVJ8bpCrTOiUHrnBiOZZ0Hu3thRwt1uVI4FmZW5u6lQeeIJa1zYtA6J4ZYrLOGj0REpJVKQUREWiVaKUwNOkAAtM6JQeucGKK+zgm1T0FERA4v0bYURETkMLplKZjZZDNbbmarzOzudh5PN7M/hB+fa2YlsU/ZuSJY5zvMbKmZLTKz181scBA5O1NH69xmuSvNzM2syx+pEsk6m9lV4Z/1EjP7fawzdrYIfrcHmdkbZrYg/Pt9SRA5O4uZPWFm28xs8SEeNzN7IPzvscjMTu3UAO7erS5AMrAaGAKkAQuBUQcs82Xg1+Hr1wB/CDp3DNb5PCArfP1LibDO4eVygJnAHKA06Nwx+DkPAxYAvcK3i4LOHYN1ngp8KXx9FFAedO5jXOdzgVOBxYd4/BLgFcCA04G5nfn9u+OWwgRglbuvcfcG4FlgygHLTAGeCl9/HjjfzCyGGTtbh+vs7m+4e2345hxgYIwzdrZIfs4A3wd+AtTFMlyURLLOXwB+6e47Adx9W4wzdrZI1tmB3PD1PGBTDPN1OnefCew4zCJTgN96yBygp5n166zv3x1LYQCwoc3tivB97S7j7k3AbqB3TNJFRyTr3NbNhP7S6Mo6XGczOwUodveXYhksiiL5OQ8HhpvZO2Y2x8wmxyxddESyzvcA15tZBTAduC020QJzpP/fj0h3PEdze3/xH3iIVSTLdCURr4+ZXQ+UAhOjmij6DrvOZpYE3A/cFKtAMRDJzzmF0BDSJEJbg7PMbLS774pytmiJZJ2vBZ5095+Z2RnA0+F1bol+vEBE9f2rO24pVADFbW4P5ODNydZlzCyF0Cbn4TbX4l0k64yZXQD8O3CZu9fHKFu0dLTOOcBo4E0zKyc09jqti+9sjvR3+y/u3ujua4HlhEqiq4pknW8G/gjg7rOBDEJzBHVXEf1/P1rdsRTmAcPM7DgzSyO0I3naActMA24MX78S+LuH9+B0UR2uc3go5RFChdDVx5mhg3V2993uXuDuJe5eQmg/ymXuXhZM3E4Rye/2C4QOKsDMCggNJ62JacrOFck6rwfOBzCzEwiVQmVMU8bWNOCG8FFIpwO73X1zZ714txs+cvcmM/sqMIPQkQtPuPsSM7sXKHP3acDjhDYxVxHaQrgmuMTHLsJ1vg/oATwX3qe+3t0vCyz0MYpwnbuVCNd5BnCRmS0FmoG73H17cKmPTYTr/E3gUTP7BqFhlJu68h95ZvYMoeG/gvB+ku8BqQDu/mtC+00uAVYBtcBnO/X7d+F/OxER6WTdcfhIRESOkkpBRERaqRRERKSVSkFERFqpFEREpJVKQeQAZtZsZu+b2WIze9HMenby699kZg+Fr99jZnd25uuLHAuVgsjB9rn7WHcfTehzLF8JOpBIrKgURA5vNm0mGzOzu8xsXnge+/9sc/8N4fsWmtnT4fs+Hj5fxwIze83M+gSQX+SIdLtPNIt0FjNLJjR9wuPh2xcRmkdoAqFJyaaZ2bnAdkJzSp3l7lVmlh9+ibeB093dzezzwLcIffpWJG6pFEQOlmlm7wMlwHzg1fD9F4UvC8K3exAqiTHA8+5eBeDu+ydXHAj8ITzXfRqwNibpRY6Bho9EDrbP3ccCgwm9me/fp2DAj8L7G8a6+/Hu/nj4/vbmi3kQeMjdTwJuJTRRm0hcUymIHIK77wZuB+40s1RCk7J9zsx6AJjZADMrAl4HrjKz3uH79w8f5QEbw9dvRKQL0PCRyGG4+wIzWwhc4+5Ph6dmnh2eaXYvcH141s4fAG+ZWTOh4aWbCJ0R7Dkz20ho6u7jglgHkSOhWVJFRKSVho9ERKSVSkFERFqpFEREpJVKQUREWqkURESklUpBRERaqRRERKSVSkFERFr9f0i/G+V6u5xxAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot the model's precision/recall curve.\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "\n", "trainingSummary = model.stages[-1].summary\n", "\n", "pr = trainingSummary.pr.toPandas()\n", "plt.plot(pr['recall'],pr['precision'])\n", "plt.ylabel('Precision')\n", "plt.xlabel('Recall')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use the model to make predictions using the test data. We'll leave the threshold for deciding between a true or false result at the default value of 0.5." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "predictions = model.transform(test_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute recall and precision for the test predictions to see how well the model does." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True positives = 21\n", "False positives = 9\n", "False negatives = 29\n", "Recall = 0.42\n", "Precision = 0.7\n" ] } ], "source": [ "pred_and_label = predictions.select(\"prediction\", \"diabetic\").toPandas()\n", "\n", "tp = pred_and_label[(pred_and_label.prediction == 1) & (pred_and_label.diabetic == 1)].count().tolist()[1]\n", "fp = pred_and_label[(pred_and_label.prediction == 1) & (pred_and_label.diabetic == 0)].count().tolist()[1]\n", "fn = pred_and_label[(pred_and_label.prediction == 0) & (pred_and_label.diabetic == 1)].count().tolist()[1]\n", "\n", "print(\"True positives = %s\" % tp)\n", "print(\"False positives = %s\" % fp)\n", "print(\"False negatives = %s\" % fn)\n", "\n", "print(\"Recall = %s\" % (tp / (tp + fn)))\n", "print(\"Precision = %s\" % (tp / (tp + fp)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Publish and deploy the model\n", "\n", "In this section you will learn how to store the model in the Watson Machine Learning repository by using the repository client.\n", "\n", "First install the client library." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting watson-machine-learning-client\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/7b/d4/cdde5b202b1c38ef124c2b147bce32004635d0ea19c2807301b2f4ffa459/watson_machine_learning_client-1.0.363-py3-none-any.whl (935kB)\n", "\u001b[K 100% |################################| 942kB 3.1MB/s eta 0:00:01\n", "\u001b[?25hCollecting certifi (from watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/9f/e0/accfc1b56b57e9750eba272e24c4dddeac86852c2bebd1236674d7887e8a/certifi-2018.11.29-py2.py3-none-any.whl (154kB)\n", "\u001b[K 100% |################################| 163kB 4.2MB/s eta 0:00:01\n", "\u001b[?25hCollecting ibm-cos-sdk (from watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/b1/d4/7e1fe33819b80d47dafa5c02c905f7acbbdff7e6cca9af668aaeaa127990/ibm-cos-sdk-2.4.4.tar.gz (50kB)\n", "\u001b[K 100% |################################| 51kB 2.3MB/s eta 0:00:01\n", "\u001b[?25hCollecting urllib3 (from watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl (118kB)\n", "\u001b[K 100% |################################| 122kB 3.7MB/s eta 0:00:01\n", "\u001b[?25hCollecting tqdm (from watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/6c/4b/c38b5144cf167c4f52288517436ccafefe9dc01b8d1c190e18a6b154cd4a/tqdm-4.31.1-py2.py3-none-any.whl (48kB)\n", "\u001b[K 100% |################################| 51kB 2.1MB/s eta 0:00:01\n", "\u001b[?25hCollecting tabulate (from watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/c2/fd/202954b3f0eb896c53b7b6f07390851b1fd2ca84aa95880d7ae4f434c4ac/tabulate-0.8.3.tar.gz (46kB)\n", "\u001b[K 100% |################################| 51kB 2.1MB/s eta 0:00:01\n", "\u001b[?25hCollecting pandas (from watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/e2/a3/c42cd52e40527ba35aed53a988c485ffeddbae0722b8b756da82464baa73/pandas-0.24.1-cp35-cp35m-manylinux1_x86_64.whl (10.0MB)\n", "\u001b[K 100% |################################| 10.0MB 1.1MB/s eta 0:00:01\n", "\u001b[?25hCollecting requests (from watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/7d/e3/20f3d364d6c8e5d2353c72a67778eb189176f08e873c9900e10c0287b84b/requests-2.21.0-py2.py3-none-any.whl (57kB)\n", "\u001b[K 100% |################################| 61kB 2.8MB/s eta 0:00:01\n", "\u001b[?25hCollecting lomond (from watson-machine-learning-client)\n", " Downloading https://files.pythonhosted.org/packages/0f/b1/02eebed49c754b01b17de7705caa8c4ceecfb4f926cdafc220c863584360/lomond-0.3.3-py2.py3-none-any.whl\n", "Collecting ibm-cos-sdk-core==2.*,>=2.0.0 (from ibm-cos-sdk->watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/85/72/99afcdf6b92840d47c8765533ef6093e43059424e3b35dd31049f09c8d7a/ibm-cos-sdk-core-2.4.4.tar.gz (1.1MB)\n", "\u001b[K 100% |################################| 1.1MB 3.8MB/s eta 0:00:01\n", "\u001b[?25hCollecting ibm-cos-sdk-s3transfer==2.*,>=2.0.0 (from ibm-cos-sdk->watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/27/44/c71a4595d311772953775b3588307ac8dd5a36501b3dfda6324173b963cc/ibm-cos-sdk-s3transfer-2.4.4.tar.gz (214kB)\n", "\u001b[K 100% |################################| 215kB 3.9MB/s eta 0:00:01\n", "\u001b[?25hCollecting pytz>=2011k (from pandas->watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/61/28/1d3920e4d1d50b19bc5d24398a7cd85cc7b9a75a490570d5a30c57622d34/pytz-2018.9-py2.py3-none-any.whl (510kB)\n", "\u001b[K 100% |################################| 512kB 3.9MB/s eta 0:00:01\n", "\u001b[?25hCollecting python-dateutil>=2.5.0 (from pandas->watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/41/17/c62faccbfbd163c7f57f3844689e3a78bae1f403648a6afb1d0866d87fbb/python_dateutil-2.8.0-py2.py3-none-any.whl (226kB)\n", "\u001b[K 100% |################################| 235kB 3.9MB/s eta 0:00:01\n", "\u001b[?25hCollecting numpy>=1.12.0 (from pandas->watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/e3/18/4f013c3c3051f4e0ffbaa4bf247050d6d5e527fe9cb1907f5975b172f23f/numpy-1.16.2-cp35-cp35m-manylinux1_x86_64.whl (17.2MB)\n", "\u001b[K 100% |################################| 17.2MB 831kB/s eta 0:00:01\n", "\u001b[?25hCollecting chardet<3.1.0,>=3.0.2 (from requests->watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)\n", "\u001b[K 100% |################################| 143kB 4.5MB/s eta 0:00:01\n", "\u001b[?25hCollecting idna<2.9,>=2.5 (from requests->watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl (58kB)\n", "\u001b[K 100% |################################| 61kB 2.7MB/s eta 0:00:01\n", "\u001b[?25hCollecting six>=1.10.0 (from lomond->watson-machine-learning-client)\n", " Downloading https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl\n", "Collecting jmespath<1.0.0,>=0.7.1 (from ibm-cos-sdk-core==2.*,>=2.0.0->ibm-cos-sdk->watson-machine-learning-client)\n", " Downloading https://files.pythonhosted.org/packages/83/94/7179c3832a6d45b266ddb2aac329e101367fbdb11f425f13771d27f225bb/jmespath-0.9.4-py2.py3-none-any.whl\n", "Collecting docutils>=0.10 (from ibm-cos-sdk-core==2.*,>=2.0.0->ibm-cos-sdk->watson-machine-learning-client)\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/36/fa/08e9e6e0e3cbd1d362c3bbee8d01d0aedb2155c4ac112b19ef3cae8eed8d/docutils-0.14-py3-none-any.whl (543kB)\n", "\u001b[K 100% |################################| 552kB 4.2MB/s eta 0:00:01\n", "\u001b[?25hBuilding wheels for collected packages: ibm-cos-sdk, tabulate, ibm-cos-sdk-core, ibm-cos-sdk-s3transfer\n", " Running setup.py bdist_wheel for ibm-cos-sdk ... \u001b[?25ldone\n", "\u001b[?25h Stored in directory: /home/spark/shared/.cache/pip/wheels/e5/dc/54/f601cc8263513665653fbf124f6989dcbaeb218fcf1a8fd4d1\n", " Running setup.py bdist_wheel for tabulate ... \u001b[?25ldone\n", "\u001b[?25h Stored in directory: /home/spark/shared/.cache/pip/wheels/2b/67/89/414471314a2d15de625d184d8be6d38a03ae1e983dbda91e84\n", " Running setup.py bdist_wheel for ibm-cos-sdk-core ... \u001b[?25ldone\n", "\u001b[?25h Stored in directory: /home/spark/shared/.cache/pip/wheels/43/73/3e/79ee45c864491743309c46837d617c0550e58978659b8f742e\n", " Running setup.py bdist_wheel for ibm-cos-sdk-s3transfer ... \u001b[?25ldone\n", "\u001b[?25h Stored in directory: /home/spark/shared/.cache/pip/wheels/45/52/14/5239d330c7bd818043a3c578329f1ecff4f1d09694b4c7aa41\n", "Successfully built ibm-cos-sdk tabulate ibm-cos-sdk-core ibm-cos-sdk-s3transfer\n", "\u001b[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.\u001b[0m\n", "\u001b[31mpyspark 2.3.0 requires py4j==0.10.6, which is not installed.\u001b[0m\n", "Installing collected packages: certifi, jmespath, docutils, urllib3, six, python-dateutil, ibm-cos-sdk-core, ibm-cos-sdk-s3transfer, ibm-cos-sdk, tqdm, tabulate, pytz, numpy, pandas, chardet, idna, requests, lomond, watson-machine-learning-client\n", "Successfully installed certifi-2018.11.29 chardet-3.0.4 docutils-0.14 ibm-cos-sdk-2.4.4 ibm-cos-sdk-core-2.4.4 ibm-cos-sdk-s3transfer-2.4.4 idna-2.8 jmespath-0.9.4 lomond-0.3.3 numpy-1.16.2 pandas-0.24.1 python-dateutil-2.8.0 pytz-2018.9 requests-2.21.0 six-1.12.0 tabulate-0.8.3 tqdm-4.31.1 urllib3-1.24.1 watson-machine-learning-client-1.0.363\n" ] } ], "source": [ "!rm -rf $PIP_BUILD/watson-machine-learning-client\n", "!pip install watson-machine-learning-client --upgrade" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Enter your Watson Machine Learning service instance credentials here\n", "They can be found in the Service Credentials tab of the Watson Machine Learning service instance that you created on IBM Cloud." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "wml_credentials={\n", " \"url\": \"https://xxx.ibm.com\",\n", " \"username\": \"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\",\n", " \"password\": \"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\",\n", " \"instance_id\": \"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\"\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Publish the model to the repository using the client" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "model_uid: e3be3fe1-3bd9-4670-b97a-03af983cdb40\n" ] } ], "source": [ "from watson_machine_learning_client import WatsonMachineLearningAPIClient\n", "\n", "client = WatsonMachineLearningAPIClient(wml_credentials)\n", "\n", "model_props = {\n", " client.repository.ModelMetaNames.NAME: \"diabetes-prediction-1\",\n", "}\n", "\n", "stored_model_details = client.repository.store_model(model, meta_props=model_props, training_data=train_data, pipeline=pipeline)\n", "\n", "model_uid = client.repository.get_model_uid( stored_model_details )\n", "print( \"model_uid: \", model_uid )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deploy the model as a web service" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "#######################################################################################\n", "\n", "Synchronous deployment creation for uid: 'e3be3fe1-3bd9-4670-b97a-03af983cdb40' started\n", "\n", "#######################################################################################\n", "\n", "\n", "INITIALIZING\n", "DEPLOY_SUCCESS\n", "\n", "\n", "------------------------------------------------------------------------------------------------\n", "Successfully finished deployment creation, deployment_uid='f22520a9-8518-459f-8613-5b50e16b08f2'\n", "------------------------------------------------------------------------------------------------\n", "\n", "\n", "https://us-south.ml.cloud.ibm.com/v3/wml_instances/4625e647-f20e-4d7c-b23c-f287445a8f23/deployments/f22520a9-8518-459f-8613-5b50e16b08f2/online\n" ] } ], "source": [ "deployment_details = client.deployments.create(model_uid, 'diabetes-prediction-1 deployment')\n", "\n", "scoring_endpoint = client.deployments.get_scoring_url(deployment_details)\n", "print(scoring_endpoint)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Call the web service to make a prediction from some sample data" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'values': [[45.0, 156.6, [45.0, 156.6], [-0.3141354817235511, 0.3141354817235511], [0.4221056369793351, 0.5778943630206649], 1.0]], 'fields': ['hdl', 'systolic', 'features', 'rawPrediction', 'probability', 'prediction']}\n" ] } ], "source": [ "scoring_payload = {\n", " \"fields\": [\"hdl\", \"systolic\"],\n", " \"values\": [[45.0, 156.6]]\n", "}\n", "\n", "score = client.deployments.score(scoring_endpoint, scoring_payload)\n", "\n", "print(str(score))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 1 }