{"paragraphs":[{"text":"%md\n## Predicting Likelihood of Building Exceeding Threshold Temperatures","dateUpdated":"2016-02-04T08:12:17+0000","config":{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"editorMode":"ace/mode/scala","colWidth":12,"editorHide":true,"enabled":true},"settings":{"params":{},"forms":{}},"jobName":"paragraph_1453148289227_938056786","id":"20160118-201809_1566845983","result":{"code":"SUCCESS","type":"HTML","msg":"
Your machines know things. From connected cars, to equipment in the field, to machines on the assembly line floor—sensors stream low-cost, always-on data. Hadoop makes it easier for you to store and refine that data and identify meaningful patterns, providing you with the insight to make proactive business decisions using predictive analytics.
\nIn this video, we show how Hadoop can be used to analyze heating, ventilation and air conditioning data to maintain ideal office temperatures and minimize expenses.
\nUsing the data cleansing and transformation mentioned in the tutorial below, we can further our analysis by applying Logistic Regression to the generated data. We will then be able to predict based on features of the building and the heating system, what buildings have a likelihood of exceeding target temperature.
\n/
\n/
\n\n"},"dateCreated":"2016-02-04T07:43:21+0000","dateStarted":"2016-02-04T08:12:13+0000","dateFinished":"2016-02-04T08:12:13+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:512"},{"text":"%sh\n\nwget http://s3.amazonaws.com/hw-sandbox/tutorial14/SensorFiles.zip\nunzip SensorFiles.zip\nhadoop fs -mkdir -p /user/zeppelin/SensorDemo\nhadoop fs -copyFromLocal -f SensorFiles/HVAC.csv /user/zeppelin/SensorDemo/\nhadoop fs -tail /user/zeppelin/SensorDemo/HVAC.csv","dateUpdated":"2016-02-04T08:12:17+0000","config":{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"editorMode":"ace/mode/sh","colWidth":12,"editorHide":true,"enabled":true},"settings":{"params":{},"forms":{}},"jobName":"paragraph_1453146917461_-1687246947","id":"20160118-195517_2008728912","result":{"code":"SUCCESS","type":"TEXT","msg":"--2016-02-03 00:39:29-- http://s3.amazonaws.com/hw-sandbox/tutorial14/SensorFiles.zip\nResolving s3.amazonaws.com... 54.231.14.32\nConnecting to s3.amazonaws.com|54.231.14.32|:80... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 64777 (63K) [application/zip]\nSaving to: “SensorFiles.zip”\n\n 0K .......... .......... .......... .......... .......... 79% 397K 0s\n 50K .......... ... 100% 9.23M=0.1s\n\n2016-02-03 00:39:29 (497 KB/s) - “SensorFiles.zip” saved [64777/64777]\n\nArchive: SensorFiles.zip\n inflating: SensorFiles/building.csv \n inflating: SensorFiles/ExtremeTemps.hiv \n inflating: SensorFiles/HVAC.csv \n inflating: SensorFiles/JoinBuildingHVAC.hiv \n\r6/17/13,20:43:51,68,70,8,10,20\r6/18/13,21:43:51,70,64,7,8,2\r6/19/13,22:43:51,68,76,19,17,20\r6/20/13,23:43:51,69,80,20,24,20\r6/21/13,0:13:20,66,55,17,25,11\r6/22/13,1:13:20,66,66,18,27,15\r6/23/13,2:13:20,69,74,17,12,18\r6/24/13,3:13:20,68,69,18,9,11\r6/25/13,4:13:20,67,69,13,28,7\r6/26/13,5:13:20,66,66,7,4,4\r6/27/13,6:13:20,69,71,9,24,11\r6/28/13,7:13:20,70,80,1,24,20\r6/29/13,8:13:20,70,79,17,21,12\r6/30/13,9:13:20,70,70,2,29,10\r6/1/13,10:13:20,67,77,9,3,5\r6/2/13,11:13:20,66,66,18,3,7\r6/3/13,12:13:20,68,57,5,7,16\r6/4/13,13:13:20,66,66,2,24,11\r6/5/13,14:13:20,67,68,4,22,20\r6/6/13,15:13:20,67,79,5,8,3\r6/7/13,16:13:20,67,62,9,14,2\r6/8/13,17:13:20,65,75,13,13,10\r6/9/13,18:13:20,67,65,4,17,13\r6/10/13,19:13:20,70,55,20,12,4\r6/11/13,20:13:20,65,68,15,2,14\r6/12/13,21:13:20,68,74,1,30,13\r6/13/13,22:13:20,66,61,4,15,20\r6/14/13,23:13:20,67,55,3,14,14\r6/15/13,0:33:07,70,60,2,9,19\r6/16/13,1:33:07,66,58,17,18,20\r6/17/13,2:33:07,68,72,17,27,12\r6/18/13,3:33:07,68,69,10,4,3\r6/19/13,4:33:07,65,63,7,23,20\r6/20/13,5:33:07,66,66,9,21,3"},"dateCreated":"2016-01-18T07:55:17+0000","dateStarted":"2016-02-03T12:39:28+0000","dateFinished":"2016-02-03T12:39:38+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:513"},{"text":"/* Let's start by creating a RDD for the data we just imported. Once an RDD is created we can create Temporary Table with Schema that allows us to run SparkSQL. The results of SQL queries are DataFrames and support all the normal RDD operations.*/\n\nval sqlContext = new org.apache.spark.sql.SQLContext(sc)\n\nval eventsFile = sc.textFile(\"hdfs:///user/zeppelin/SensorDemo/HVAC.csv\")\n\ncase class Event(Date: String, \n Time: String, \n TargetTemp: Float, \n ActualTemp: Float, \n System: String, \n SystemAge: Int, \n BuildingID: String)\n\n\n\nval eventsRDD = eventsFile.mapPartitionsWithIndex { (idx, iter) => if (idx == 0) iter.drop(1) else iter }.map(s=> s.split(\",\")).map( \n s => Event(s(0),\n s(1), \n s(2).toFloat, \n s(3).toFloat, \n s(4), \n s(5).toInt, \n s(6)\n )\n )\n\neventsRDD.count\n\neventsRDD.toDF().registerTempTable(\"tempSeries\")","dateUpdated":"2016-02-04T08:12:17+0000","config":{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"editorMode":"ace/mode/scala","colWidth":12,"editorHide":true,"enabled":true},"settings":{"params":{},"forms":{}},"jobName":"paragraph_1453148304027_-2024879169","id":"20160118-201824_1746472555","result":{"code":"SUCCESS","type":"TEXT","msg":"sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@762109a9\neventsFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[57] at textFile at