In this notebook, we use environmental observations, build a linear model to enable (binary) classification of the environment state, and deriive the coefficients for an implementation of the model on the sensor device.

In [None]:

# @hidden_cell
# The following code contains the credentials for a connection in your Project.
# You might want to remove those credentials before you share your notebook.
credentials_1 = {
    'username': '<Cloudant Username>',
    'password': """<Cloudant Password>""",
    'custom_url': '<Cloudant URL>',
    'port': '50000',
}


In [None]:
!pip install cloudant

Use the credentials to connect to the Cloudant service instance

In [None]:
from cloudant import Cloudant
u = credentials_1['username']
p = credentials_1['password']
a = credentials_1['username']
client = Cloudant(u, p, account=a, connect=True, auto_renew=True)

Connect to your IOT event store within Cloudant, and check the number of documents available

In [None]:
eventstore = 'training'
db = client[eventstore]
db.doc_count()

Read a subset of the records available -- if the event store holds many thousands of entries, there may be insufficient memory available to load them all

The `include_docs=True` is necessary, otherwise all that is returned is the list of document ids.

In [None]:
loadlimit = 1000
alldocs =  db.all_docs(limit=loadlimit, include_docs= True)
len(alldocs['rows'])

Look at the first event/observation document, and select the features within the "doc" key that you want to include in modelling

In [None]:
alldocs['rows'][0]

In this case, the features of interest are `temperature`,`humidity`, and `class` - the timestamp `ts` is going to be useful for spotting trends, time-based anomalies etc.

Iterate the returned documents into an array of events with common schema

In [None]:
events = []
for r in alldocs['rows']:
    doc = r["doc"]
    obs = [doc['time'],doc['temp'],doc['humidity'],doc['class']]
    events.append(obs)

The events are now  loaded in a form that can be converted into a dataframe, which will be used for subsequent steps

In [None]:
import pandas as pd
df = pd.DataFrame(data=events,columns=["timestamp","temperature","humidity","class"])
display(df)

Let's take a look as some of the features over time. We'll use [MatPlotLib](https://matplotlib.org/) for visualisation

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.scatter(df['timestamp'],df['temperature'])

In [None]:
plt.scatter(df['timestamp'],df['humidity'])

Now let's take our data and apply a basic classification,  and generate a linear model to derive coefficients 

In [None]:
from sklearn import linear_model
import random
from scipy.special import expit

In [None]:
aX = []
aY = []
for i, row in df.iterrows():
    t= row["temperature"]
    h= row["humidity"]
    c= row["class"]
    obs = [t,h]
    aX.append(obs)
    aY.append([c])

Now split the features from the class

In [None]:
import pandas as pd
X = pd.DataFrame(data=aX,columns=["temperature","humidity"])
y = pd.DataFrame(data=aY,columns=["class"])
display(y)
display(X)

SciKitLearn package provides a comprehensive set of tools for splitting data, building and validating models

First we split the inout data into 2 groups of 2 subsets - a training set, and a test set

In [None]:
# split X and y into training and testing sets
from sklearn.model_selection import train_test_split

#fraction of input data to hold for testing -- excluded from the training 
testsplit = 0.25

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=testsplit,random_state=0)

In [None]:
X_train

Use the default Logistic Regression function to train based on the input observations

In [None]:
from sklearn.linear_model import LogisticRegression
# instantiate the model (using the default parameters)
logreg = LogisticRegression()

# fit the model with data
logreg.fit(X_train,y_train)

Generate the predictions for the test data based on the generated model

In [None]:
# generate the predictions from the test subset
y_pred=logreg.predict(X_test)

At this stage, we can run a comparison between the actual values for the class, and the predicted values - this will generate a "confusion matrix" which shows how well the model can predict classes, and when it gets it wrong (false positives, false negatives)

In [None]:
from sklearn import metrics
cnf_matrix = metrics.confusion_matrix(y_test, y_pred)
cnf_matrix

In [None]:
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
print("Precision:",metrics.precision_score(y_test, y_pred))
print("Recall:",metrics.recall_score(y_test, y_pred))

In [None]:
help(logreg)

The model contains the coefficients that can be applied to features to generate the class -- these can be copied and applied to the edge device algorithm

In [None]:
logreg.coef_

In [None]:
logreg.intercept_