{"paragraphs":[{"text":"%md\n#Download the data locally and upload it to hdfs","dateUpdated":"Mar 15, 2016 4:48:13 PM","config":{"colWidth":12,"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"enabled":true,"editorMode":"ace/mode/markdown"},"settings":{"params":{},"forms":{}},"jobName":"paragraph_1458060467840_1277221303","id":"20160315-164747_95796257","result":{"code":"SUCCESS","type":"HTML","msg":"
There is no such methods on RDDs, unlike Scala collections.\n
Even if there were, RDDs are partitioned, and a drop() call would run on every node, dropping the first element of each partition. On one partition, that element would be the header, but not on the other partitions.
DC_DIST (integer): District number\n
SECTOR (integer): Sector or PSA Number\n
DISPATCH_DATE (date string): Date of Incident (modified from original data)\n
DISPATCH_TIME (time string): Time of Incident (modified from original data)\n
DC_KEY: (text): Unique ID of each crime\n
UCR_General (integer): Rounded Crime Code\n
TEXT_GENERAL_CODE (string): Human-readable Crime Code\n
OBJECTID (integer): Unique row ID\n
POINT_X (decimal): Latitude where crime occurred\n
POINT_Y (decimal): Longitude where crime occurred
TAKE NOTE: We are deliberately only some of the fields we need for this lab. There's no sense dragging around more data than we need.
\n"},"dateCreated":"Mar 15, 2016 4:43:11 PM","dateStarted":"Mar 15, 2016 4:47:28 PM","dateFinished":"Mar 15, 2016 4:47:28 PM","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:39"},{"text":"case class CrimeData(dateString: String,\n timeString: String,\n offense: String,\n latitude: String,\n longitude: String)\nval dataRDD = noHeaderRDD.map { line =>\n val cols = line.split(\",\")\n CrimeData(dateString = cols(10), //DISPATCH_DATE\n timeString = cols(11), //DISPATCH_TIME\n offense = cols(6), //TEXT_GENERAL_CODE\n latitude = cols(7), //POINT_X\n longitude = cols(8)) //POINT_Y\n}\ndataRDD.take(10).foreach(println)","dateUpdated":"Mar 14, 2016 10:14:48 PM","config":{"enabled":true,"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"colWidth":12},"settings":{"params":{},"forms":{}},"jobName":"paragraph_1457988919606_-1038404646","id":"20160314-205519_1624180723","result":{"code":"SUCCESS","type":"TEXT","msg":"defined class CrimeData\ndataRDD: org.apache.spark.rdd.RDD[CrimeData] = MapPartitionsRDD[9] at map athttp://docs.scala-lang.org/overviews/core/string-interpolation.html
\n"},"dateCreated":"Mar 15, 2016 4:51:33 PM","dateStarted":"Mar 15, 2016 4:52:26 PM","dateFinished":"Mar 15, 2016 4:52:26 PM","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:44"},{"text":"val offenseCounts = dataRDD.map(item => (item.offense, 1)).countByKey()\nfor ((offense, count) <- offenseCounts) {\n println(f\"$offense%30s $count%5d\")\n}","dateUpdated":"Mar 15, 2016 2:21:47 PM","config":{"enabled":true,"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"colWidth":12,"lineNumbers":true},"settings":{"params":{},"forms":{}},"jobName":"paragraph_1457988919608_-1040713140","id":"20160314-205519_1085288873","result":{"code":"SUCCESS","type":"TEXT","msg":"offenseCounts: scala.collection.Map[String,Long] = Map(129342978 -> 1, Rape -> 1061, Burglary Residential -> 5585, Motor Vehicle Theft -> 1916, Burglary Non-Residential -> 1251, Theft from Vehicle -> 10608, 129338613 -> 1, Thefts -> 19619, Recovered Stolen Motor Vehicle -> 5731, \"Homicide - Criminal \" -> 40, Aggravated Assault Firearm -> 1940, Robbery No Firearm -> 3220, Robbery Firearm -> 2384, Homicide - Criminal -> 183, Aggravated Assault No Firearm -> 4634)\n 129342978 1\n Rape 1061\n Burglary Residential 5585\n Motor Vehicle Theft 1916\n Burglary Non-Residential 1251\n Theft from Vehicle 10608\n 129338613 1\n Thefts 19619\nRecovered Stolen Motor Vehicle 5731\n \"Homicide - Criminal \" 40\n Aggravated Assault Firearm 1940\n Robbery No Firearm 3220\n Robbery Firearm 2384\n Homicide - Criminal 183\n Aggravated Assault No Firearm 4634\n"},"dateCreated":"Mar 14, 2016 8:55:19 PM","dateStarted":"Mar 15, 2016 2:21:47 PM","dateFinished":"Mar 15, 2016 2:21:55 PM","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:45"},{"text":"%md\n#ETL\n\n###There's some junk in our data. Let's clean it up a bit.","dateUpdated":"Mar 15, 2016 4:53:13 PM","config":{"colWidth":12,"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"enabled":true,"editorMode":"ace/mode/scala"},"settings":{"params":{},"forms":{}},"jobName":"paragraph_1458060775700_1045459091","id":"20160315-165255_1932985900","result":{"code":"SUCCESS","type":"HTML","msg":"