{"cells":[{"cell_type":"markdown","source":["Spark Tutorial Setup for Databricks\n------------------------------------\n\nThis notebooks setups the Databrick Community Platform for the Spark Tutorial at: <https://github.com/piotrszul/spark-tutorial>.\n\n`Run All` the steps and verify that there are no errors.\n\nThen please `Restart` the attached cluster."],"metadata":{}},{"cell_type":"code","source":["#Create directories in the Databrisck filesystem\ndbutils.fs.mkdirs(\"dbfs:/databricks/init\")\ndbutils.fs.mkdirs(\"dbfs:/data\")\ndbutils.fs.mkdirs(\"dbfs:/output\")"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":["%%sh\n\n#Download and unpack tutorial dataset\nwget -O /tmp/data.tar.gz 'https://github.com/piotrszul/spark-tutorial/releases/download/v0.1-alpha/data.tar.gz'\ntar -xzf /tmp/data.tar.gz -C /dbfs/data"],"metadata":{},"outputs":[],"execution_count":3},{"cell_type":"code","source":["%%sh\n\n# display the available datasets\nls -lh /dbfs/data"],"metadata":{},"outputs":[],"execution_count":4},{"cell_type":"code","source":["# create a cluster init scritps\n# the script creates symbolic links in the posix home directory to the direcotries in the\n# databricks filesystem\n\ndbutils.fs.put(\"/databricks/init/tutorial-setup.sh\",\"\"\"\n#!/bin/bash\necho \"Setting up links at /databricks/driver\"\ncd /databricks/driver\nln -s /dbfs/data data\nln -s /dbfs/output output\n\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":5},{"cell_type":"markdown","source":["** ALL DONE. **"],"metadata":{}}],"metadata":{"name":"0.1_Setup_DataBricks","notebookId":1363310445285751},"nbformat":4,"nbformat_minor":0}