{"paragraphs":[{"text":"%md\n\n## Intro to Machine Learning\n#### with Linear Regression\n\n**Level**: Beginner\n**Language**: Scala\n**Requirements**: \n- [HDP 2.6](http://hortonworks.com/products/sandbox/) (or later) or [HDCloud](https://hortonworks.github.io/hdp-aws/)\n- Spark 2.x\n\n**Author**: Robert Hryniewicz\n**Follow** [@RobH8z](https://twitter.com/RobertH8z)","user":"admin","dateUpdated":"2017-06-13T18:48:29+0000","config":{"tableHide":false,"editorSetting":{"editOnDblClick":true,"language":"markdown"},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"title":false,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"
\n

Intro to Machine Learning

\n

with Linear Regression

\n

Level: Beginner
Language: Scala
Requirements:
- HDP 2.6 (or later) or HDCloud
- Spark 2.x

\n

Author: Robert Hryniewicz
Follow @RobH8z

\n
"}]},"apps":[],"jobName":"paragraph_1487794229481_-407807426","id":"20161021-175215_279569041","dateCreated":"2017-02-23T01:40:29+0000","dateStarted":"2017-06-13T18:48:29+0000","dateFinished":"2017-06-13T18:48:30+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:4027"},{"title":"Intro","text":"%md\n\nIn this lab we'll cover basics of building a Linear Regression model using Apache Spark ML Pipeline API. \n\n- Starting from a simple 2 dim array\n- Using Pipeline API to create vectorised version of features and build the model\n- Using Pipeline API to calculate predictions\n- Exchanging data between Scala and Python pandas via TempView (new API in 2.x)\n- Simplified plotting by using pandas plot function (pandas similar as Spark DataFrame)\n- Saving and loading back the model","dateUpdated":"2017-02-23T01:40:29+0000","config":{"editorSetting":{},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"
\n

In this lab we’ll cover basics of building a Linear Regression model using Apache Spark ML Pipeline API.

\n\n
"}]},"apps":[],"jobName":"paragraph_1487794229481_-407807426","id":"20161021-175322_1250309450","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4028"},{"title":"New to Scala?","text":"%md\n\nIn this lab we will use basic Scala syntax. If you would like to learn more about Scala, here's an excellent **[Tutorial](http://www.dhgarrette.com/nlpclass/scala/basics.html)**.","dateUpdated":"2017-02-23T01:40:29+0000","config":{"editorSetting":{},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"
\n

In this lab we will use basic Scala syntax. If you would like to learn more about Scala, here’s an excellent Tutorial.

\n
"}]},"apps":[],"jobName":"paragraph_1487794229481_-407807426","id":"20161021-175356_201029376","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4029"},{"title":"How to run a paragraph?","text":"%md\nTo run a paragraph in a Zeppelin notebook you can either click the `play` button (blue triangle) on the right-hand side or simply press `Shift + Enter`.","dateUpdated":"2017-02-23T01:40:29+0000","config":{"editorSetting":{},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":true,"title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"
\n

To run a paragraph in a Zeppelin notebook you can either click the play button (blue triangle) on the right-hand side or simply press Shift + Enter.

\n
"}]},"apps":[],"jobName":"paragraph_1487794229481_-407807426","id":"20161021-175756_1740792557","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4030"},{"title":"What is a model?","text":"%md\n\nA **model** is a **mathematical formula** with a number of parameters that need to be learned from the data. **Fitting a model to the data** is a process known as **model training**.\n\nTake, for instance one feature/variable linear regression, where a goal is to fit a line (described by the well know eqution `y = ax + b`) to a set of distributed data points.\n\nFor example, assume that once model training is complete we get a model equation `y = 2x + 5`. Then for a set of inputs `[1, 0, 7, 2, …]` we would get a set of outputs `[7, 5, 19, 9, …]`. That's it!\n\nIn this notebook you will get a chance to learn a step-by-step process of training a one variable linear regression model with Spark.","dateUpdated":"2017-02-23T01:40:29+0000","config":{"editorSetting":{},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"
\n

A model is a mathematical formula with a number of parameters that need to be learned from the data. Fitting a model to the data is a process known as model training.

\n

Take, for instance one feature/variable linear regression, where a goal is to fit a line (described by the well know eqution y = ax + b) to a set of distributed data points.

\n

For example, assume that once model training is complete we get a model equation y = 2x + 5. Then for a set of inputs [1, 0, 7, 2, …] we would get a set of outputs [7, 5, 19, 9, …]. That’s it!

\n

In this notebook you will get a chance to learn a step-by-step process of training a one variable linear regression model with Spark.

\n
"}]},"apps":[],"jobName":"paragraph_1487794229482_-406653179","id":"20161021-181247_1205160838","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4031"},{"title":"Why Linear Regression?","text":"%md\n\nWe're introducing Machine Learning with **Linear Regression** because it's one of the more basic and **commonly used predictive analytics method**. It's also easy to explain and grasp intuitively as you'll make your way through the examples.\n\nNote, that we will not cover the details of how the underlying Linear Regression algorithm works. We will merely focus on applying the algorithm and generating a model. If you would like to learn more about Linear Regression and other algorithms check out this excellent [Coursera Machine Learning Course](https://www.coursera.org/learn/machine-learning) taught by Andrew Ng.","dateUpdated":"2017-02-23T01:40:29+0000","config":{"tableHide":false,"editorSetting":{"editOnDblClick":true,"language":"markdown"},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"
\n

We’re introducing Machine Learning with Linear Regression because it’s one of the more basic and commonly used predictive analytics method. It’s also easy to explain and grasp intuitively as you’ll make your way through the examples.

\n

Note, that we will not cover the details of how the underlying Linear Regression algorithm works. We will merely focus on applying the algorithm and generating a model. If you would like to learn more about Linear Regression and other algorithms check out this excellent Coursera Machine Learning Course taught by Andrew Ng.

\n
"}]},"apps":[],"jobName":"paragraph_1487794229482_-406653179","id":"20161021-175825_163637505","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4032"},{"title":"Verify Spark version (should be 2.x)","text":"%spark2.spark\n\nspark.version","dateUpdated":"2017-03-09T14:00:54+0000","config":{"editorSetting":{"editOnDblClick":false,"language":"text"},"colWidth":12,"editorMode":"ace/mode/text","title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1487794229482_-406653179","id":"20161023-082330_1254378286","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4033"},{"title":"Create a small data set that we will use for our Linear Regression model","text":"%spark2.spark\n\nimport org.apache.spark.ml.linalg.Vectors\n\nval data = spark.createDataFrame(Seq(\n\t(-12.0, -4.9),\n\t( -6.0, -4.5),\n\t( -7.2, -4.1),\n\t( -5.0, -3.2),\n\t( -2.0, -3.0),\n\t( -3.1, -2.1),\n\t( -4.0, -1.5),\n\t( -2.2, -1.2),\n\t( -2.0, -0.7),\n\t( 1.0, -0.5),\n\t( -0.7, -0.2),\n\t( 1.2, 0.1),\n\t( 2.2, 0.3), \n\t( 6.5, 0.52),\n\t( 4.2, 0.72),\n\t( 8.6, 1.1),\n\t( 9.5, 2.3),\n\t( 14.52, 3.4),\n\t( 12.9, 3.61), \n\t( 16.3, 3.8)\n)).toDF(\"y\", \"x\")","dateUpdated":"2017-03-09T14:00:55+0000","config":{"editorSetting":{"editOnDblClick":false,"language":"text"},"colWidth":12,"editorMode":"ace/mode/text","editorHide":false,"title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1487794229482_-406653179","id":"20161023-063018_227184425","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4034"},{"title":"Run Linear Regression","text":"%spark2.spark\n\nimport org.apache.spark.ml.Pipeline\nimport org.apache.spark.ml.feature.VectorAssembler\nimport org.apache.spark.ml.regression.{LinearRegression, LinearRegressionModel}\n\n// Set Features\nval features = new VectorAssembler()\n .setInputCols(Array(\"x\"))\n .setOutputCol(\"features\")\n\nval linreg = new LinearRegression().setLabelCol(\"y\")\n \nval pipeline = new Pipeline().setStages(Array(features, linreg))\nval model = pipeline.fit(data)","dateUpdated":"2017-03-09T14:00:55+0000","config":{"editorSetting":{"editOnDblClick":false,"language":"text"},"colWidth":12,"editorMode":"ace/mode/text","title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1487794229483_-407037928","id":"20161023-063047_142266605","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4035"},{"title":"Summarize model training","text":"%spark2.spark\n\nval linRegModel = model.stages(1).asInstanceOf[LinearRegressionModel]\n\nprintln(s\"RMSE: ${linRegModel.summary.rootMeanSquaredError}\")\nprintln(s\"r2: ${linRegModel.summary.r2}\")\nprintln(s\"Model: Y = ${linRegModel.coefficients(0)} * X + ${linRegModel.intercept}\")\n\nlinRegModel.summary.residuals.show()","dateUpdated":"2017-03-09T14:00:55+0000","config":{"editorSetting":{"editOnDblClick":false,"language":"text"},"colWidth":12,"editorMode":"ace/mode/text","title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1487794229483_-407037928","id":"20161023-065504_1972452148","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4036"},{"title":"Use the same data to predict the model ","text":"%spark2.spark\n\nval result = model.transform(data).select(\"x\", \"y\", \"prediction\")\n\nresult.show()","dateUpdated":"2017-03-09T14:00:55+0000","config":{"editorSetting":{"editOnDblClick":false,"language":"text"},"colWidth":12,"editorMode":"ace/mode/text","title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1487794229483_-407037928","id":"20161104-232822_1626397932","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4037"},{"title":"Create a Temporary View","text":"%spark2.spark\n\nresult.createOrReplaceTempView(\"linreg\")","dateUpdated":"2017-03-09T14:00:55+0000","config":{"editorSetting":{"editOnDblClick":false,"language":"text"},"colWidth":12,"editorMode":"ace/mode/text","title":true,"results":{},"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"enabled":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1487794229483_-407037928","id":"20161104-232946_1293428390","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4038"},{"title":"Save the model","text":"%spark2.spark\n\nlinreg.write.overwrite().save(\"hdfs:///tmp/linregmodel\")","dateUpdated":"2017-03-09T14:00:55+0000","config":{"editorSetting":{"editOnDblClick":false,"language":"text"},"colWidth":12,"editorMode":"ace/mode/text","editorHide":false,"title":true,"results":[],"enabled":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1487794229483_-407037928","id":"20161019-185407_1496443931","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4039"},{"title":"Load back the model","text":"%spark2.spark\n\nimport org.apache.spark.ml.regression.{LinearRegression, LinearRegressionModel}\n\nval sameModel = LinearRegression.load(\"hdfs:///tmp/linregmodel\")\nval sameLinRegModel = model.stages(1).asInstanceOf[LinearRegressionModel]\n\n// Verify coefficients and intercept\nprintln(s\"Coefficient: ${sameLinRegModel.coefficients} Intercept: ${sameLinRegModel.intercept}\")","dateUpdated":"2017-03-09T14:00:55+0000","config":{"editorSetting":{"editOnDblClick":false,"language":"text"},"colWidth":12,"editorMode":"ace/mode/text","editorHide":false,"title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1487794229484_-408961673","id":"20161019-185706_496188641","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4040"},{"text":"%md\n\n#### Visualise the model and training data\n\n**Note**: The following paragraphs require the Python **Pandas** library which is not installed by default. Instead, we've ran the paragraphs for you and disabled **run** so you will avoid any errors. ","dateUpdated":"2017-02-23T01:40:29+0000","config":{"tableHide":false,"editorSetting":{"editOnDblClick":true,"language":"markdown"},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"title":false,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"
\n

Visualise the model and training data

\n

Note: The following paragraphs require the Python Pandas library which is not installed by default. Instead, we’ve ran the paragraphs for you and disabled run so you will avoid any errors.

\n
"}]},"apps":[],"jobName":"paragraph_1487794229484_-408961673","id":"20161104-232912_1326325430","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4041"},{"title":"Convert to Pandas (requires Pandas)","text":"%spark2.pyspark\n\nlinreg = spark.table(\"linreg\").toPandas()\nlinreg","dateUpdated":"2017-02-23T01:40:29+0000","config":{"tableHide":true,"editorSetting":{"editOnDblClick":false,"language":"text"},"colWidth":12,"editorMode":"ace/mode/text","editorHide":false,"title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":false},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"requires Pandas library\n"}]},"apps":[],"jobName":"paragraph_1487794229484_-408961673","id":"20161104-233336_1115890215","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4042"},{"title":"Plot the result (requires Pandas)","text":"%spark2.pyspark\nimport StringIO\n\ndef show(p):\n img = StringIO.StringIO()\n p.get_figure().savefig(img, format='svg')\n img.seek(0)\n print \"%html
\" + img.buf + \"
\"\n\nplot = linreg.plot.scatter(x='x', y='y')\n \nplot.plot(linreg[\"x\"], linreg[\"prediction\"])\nshow(plot)","dateUpdated":"2017-02-23T01:40:29+0000","config":{"editorSetting":{"editOnDblClick":false,"language":"text"},"colWidth":12,"editorMode":"ace/mode/text","title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":false},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"
\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
\n"}]},"apps":[],"jobName":"paragraph_1487794229484_-408961673","id":"20161104-233454_1462963013","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4043"},{"title":"More ML Algorithms","text":"%md\n\nIn this lab we have looked at Linear Regression, but there are other popular algorithms. In the following labs we'll begin exploring:\n\n- [Decision trees](https://spark.apache.org/docs/latest/ml-classification-regression.html#decision-trees)\n- [Random forest](https://spark.apache.org/docs/latest/ml-classification-regression.html#random-forests)\n- [K-Means Clustering](https://spark.apache.org/docs/latest/ml-clustering.html#k-means)","dateUpdated":"2017-02-23T01:40:29+0000","config":{"tableHide":false,"editorSetting":{"editOnDblClick":true,"language":"markdown"},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"
\n

In this lab we have looked at Linear Regression, but there are other popular algorithms. In the following labs we’ll begin exploring:

\n\n
"}]},"apps":[],"jobName":"paragraph_1487794229485_-409346422","id":"20161021-181337_384523728","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4044"},{"title":"Additional Resources","text":"%md\n\nWe hope you've enjoyed this introductory lab. Below are additional resources that you should find useful:\n\n1. [Hortonworks Apache Spark Tutorials](http://hortonworks.com/tutorials/#tuts-developers) are your natural next step where you can explore Spark in more depth.\n2. [Hortonworks Community Connection (HCC)](https://community.hortonworks.com/spaces/85/data-science.html?type=question) is a great resource for questions and answers on Spark, Data Analytics/Science, and many more Big Data topics.\n3. [Hortonworks Apache Spark Docs](http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/ch_developing-spark-apps.html) - official Spark documentation.\n4. [Hortonworks Apache Zeppelin Docs](http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_zeppelin-component-guide/content/ch_using_zeppelin.html) - official Zeppelin documentation.","dateUpdated":"2017-02-23T01:40:29+0000","config":{"tableHide":false,"editorSetting":{"editOnDblClick":true,"language":"markdown"},"colWidth":10,"editorMode":"ace/mode/markdown","editorHide":true,"title":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"
\n

We hope you’ve enjoyed this introductory lab. Below are additional resources that you should find useful:

\n
    \n
  1. Hortonworks Apache Spark Tutorials are your natural next step where you can explore Spark in more depth.
  2. \n
  3. Hortonworks Community Connection (HCC) is a great resource for questions and answers on Spark, Data Analytics/Science, and many more Big Data topics.
  4. \n
  5. Hortonworks Apache Spark Docs - official Spark documentation.
  6. \n
  7. Hortonworks Apache Zeppelin Docs - official Zeppelin documentation.
  8. \n
\n
"}]},"apps":[],"jobName":"paragraph_1487794229485_-409346422","id":"20161021-162613_1357875353","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4045"},{"text":"%angular\n
\n
\n\n \"HCC\"\n\n
","dateUpdated":"2017-02-23T01:40:29+0000","config":{"editorSetting":{},"colWidth":2,"editorMode":"ace/mode/scala","editorHide":true,"results":[{"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}}}],"enabled":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"ANGULAR","data":"
\n
\n\n \"HCC\"\n\n
"}]},"apps":[],"jobName":"paragraph_1487794229485_-409346422","id":"20161021-182558_90195999","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4046"},{"text":"","dateUpdated":"2017-02-23T01:40:29+0000","config":{"editorSetting":{"editOnDblClick":false,"language":"text"},"colWidth":12,"editorMode":"ace/mode/text","results":{},"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"enabled":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1487794229486_-408192175","id":"20161021-182620_1556029654","dateCreated":"2017-02-23T01:40:29+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:4047"}],"name":"Labs / Spark 2.x / Data Scientist / Scala / 101 - Intro to Machine Learning","id":"2CCBNZ5YY","angularObjects":{"2C9J4X9BB:shared_process":[],"2C97XTJFE:shared_process":[],"2C9BD8WCX:shared_process":[],"2CBT85YD7:shared_process":[],"2C8RGTKC3:shared_process":[],"2CBQNWPMD:shared_process":[],"2C8JDGPHH:shared_process":[],"2C9CSKWHY:shared_process":[],"2CBN9WPNN:shared_process":[],"2CB11VTD7:shared_process":[],"2C9Z4TVBW:shared_process":[],"2CB3RUCX8:shared_process":[],"2C9PSG7XP:shared_process":[],"2C8PPBWFC:shared_process":[],"2C95B7UJY:shared_process":[],"2CB91QEZG:shared_process":[],"2CAPDMDA1:shared_process":[],"2CACTG458:shared_process":[],"2CAD4U2BW:shared_process":[],"2CBTJTHZE:shared_process":[],"2C9VPGHR9:shared_process":[]},"config":{"looknfeel":"default","personalizedMode":"false"},"info":{}}