{"cells":[{"metadata":{},"cell_type":"markdown","source":["In this notebook, we use environmental observations, build a linear model to enable (binary) classification of the environment state, and deriive the coefficients for an implementation of the model on the sensor device."]},{"metadata":{},"cell_type":"code","source":["\n","# @hidden_cell\n","# The following code contains the credentials for a connection in your Project.\n","# You might want to remove those credentials before you share your notebook.\n","credentials_1 = {\n","    'username': '<Cloudant Username>',\n","    'password': \"\"\"<Cloudant Password>\"\"\",\n","    'custom_url': '<Cloudant URL>',\n","    'port': '50000',\n","}\n"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"code","source":["!pip install cloudant"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":["Use the credentials to connect to the Cloudant service instance"]},{"metadata":{},"cell_type":"code","source":["from cloudant import Cloudant\n","u = credentials_1['username']\n","p = credentials_1['password']\n","a = credentials_1['username']\n","client = Cloudant(u, p, account=a, connect=True, auto_renew=True)"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":["Connect to your IOT event store within Cloudant, and check the number of documents available"]},{"metadata":{},"cell_type":"code","source":["eventstore = 'training'\n","db = client[eventstore]\n","db.doc_count()"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":["Read a subset of the records available -- if the event store holds many thousands of entries, there may be insufficient memory available to load them all\n","\n","The `include_docs=True` is necessary, otherwise all that is returned is the list of document ids."]},{"metadata":{},"cell_type":"code","source":["loadlimit = 1000\n","alldocs =  db.all_docs(limit=loadlimit, include_docs= True)\n","len(alldocs['rows'])"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":["Look at the first event/observation document, and select the features within the \"doc\" key that you want to include in modelling"]},{"metadata":{},"cell_type":"code","source":["alldocs['rows'][0]"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":["In this case, the features of interest are `temperature`,`humidity`, and `class` - the timestamp `ts` is going to be useful for spotting trends, time-based anomalies etc.\n","\n","Iterate the returned documents into an array of events with common schema"]},{"metadata":{},"cell_type":"code","source":["events = []\n","for r in alldocs['rows']:\n","    doc = r[\"doc\"]\n","    obs = [doc['time'],doc['temp'],doc['humidity'],doc['class']]\n","    events.append(obs)"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":["The events are now  loaded in a form that can be converted into a dataframe, which will be used for subsequent steps"]},{"metadata":{},"cell_type":"code","source":["import pandas as pd\n","df = pd.DataFrame(data=events,columns=[\"timestamp\",\"temperature\",\"humidity\",\"class\"])\n","display(df)"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":["Let's take a look as some of the features over time. We'll use [MatPlotLib](https://matplotlib.org/) for visualisation"]},{"metadata":{},"cell_type":"code","source":["import matplotlib.pyplot as plt"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"code","source":["plt.scatter(df['timestamp'],df['temperature'])"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"code","source":["plt.scatter(df['timestamp'],df['humidity'])"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":["Now let's take our data and apply a basic classification,  and generate a linear model to derive coefficients "]},{"metadata":{},"cell_type":"code","source":["from sklearn import linear_model\n","import random\n","from scipy.special import expit"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"code","source":["aX = []\n","aY = []\n","for i, row in df.iterrows():\n","    t= row[\"temperature\"]\n","    h= row[\"humidity\"]\n","    c= row[\"class\"]\n","    obs = [t,h]\n","    aX.append(obs)\n","    aY.append([c])"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":["Now split the features from the class"]},{"metadata":{},"cell_type":"code","source":["import pandas as pd\n","X = pd.DataFrame(data=aX,columns=[\"temperature\",\"humidity\"])\n","y = pd.DataFrame(data=aY,columns=[\"class\"])\n","display(y)\n","display(X)"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":["SciKitLearn package provides a comprehensive set of tools for splitting data, building and validating models\n","\n","First we split the inout data into 2 groups of 2 subsets - a training set, and a test set"]},{"metadata":{},"cell_type":"code","source":["# split X and y into training and testing sets\n","from sklearn.model_selection import train_test_split\n","\n","#fraction of input data to hold for testing -- excluded from the training \n","testsplit = 0.25\n","\n","X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=testsplit,random_state=0)"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"code","source":["X_train"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":["Use the default Logistic Regression function to train based on the input observations"]},{"metadata":{},"cell_type":"code","source":["from sklearn.linear_model import LogisticRegression\n","# instantiate the model (using the default parameters)\n","logreg = LogisticRegression()\n","\n","# fit the model with data\n","logreg.fit(X_train,y_train)"],"execution_count":null,"outputs":[]},{"source":["Generate the predictions for the test data based on the generated model"],"cell_type":"markdown","metadata":{}},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# generate the predictions from the test subset\n","y_pred=logreg.predict(X_test)"]},{"metadata":{},"cell_type":"markdown","source":["At this stage, we can run a comparison between the actual values for the class, and the predicted values - this will generate a \"confusion matrix\" which shows how well the model can predict classes, and when it gets it wrong (false positives, false negatives)"]},{"metadata":{},"cell_type":"code","source":["from sklearn import metrics\n","cnf_matrix = metrics.confusion_matrix(y_test, y_pred)\n","cnf_matrix"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"code","source":["print(\"Accuracy:\",metrics.accuracy_score(y_test, y_pred))\n","print(\"Precision:\",metrics.precision_score(y_test, y_pred))\n","print(\"Recall:\",metrics.recall_score(y_test, y_pred))"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"code","source":["help(logreg)"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":["The model contains the coefficients that can be applied to features to generate the class -- these can be copied and applied to the edge device algorithm"]},{"metadata":{},"cell_type":"code","source":["logreg.coef_"],"execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"code","source":["logreg.intercept_"],"execution_count":null,"outputs":[]}],"metadata":{"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"language_info":{"name":"python","version":"3.8.5-final","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"}},"nbformat":4,"nbformat_minor":1}