{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### Connecting to remote spark through DSX-HI"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"success configuring sparkmagic livy.\n"
]
}
],
"source": [
"%load_ext sparkmagic.magics\n",
"from dsx_core_utils import proxy_util,dsxhi_util\n",
"proxy_util.configure_proxy_livy()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['https://becks1.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy/v1', 'https://becks1.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy2/v1', 'https://cdh513edge11.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy/v1', 'https://cdh514edge1.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy/v1', 'https://cdh515edge1.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy/v1', 'https://cdh515edge1.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy2/v1', 'https://centos74edge1.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy/v1', 'https://centos74edge1.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy2/v1', 'https://rated3.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy2/v1', 'https://yccdh5.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy/v1', 'https://yccdh5.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy2/v1', 'https://ycedge1.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy/v1', 'https://ycedge1.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy2/v1', 'https://zinc1.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy/v1', 'https://zinc1.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy2/v1']\n"
]
}
],
"source": [
"dsxhi_util.list_livy_endpoints()"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### Pushing the python virtual environment to cluster using DSX-HI"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{ \"imageId\": \"968c2101554e0d1e0d4fdd3720aaa565a2910cb46f4d7ed61188b6ceeec22930\",\r\n",
" \"scriptCommand\": \"anaconda2/bin/python2.7\",\r\n",
" \"libPaths\": [\"usr/local/spark-2.0.2-bin-hadoop2.7/python\",\"user-home/.scripts/common-helpers/batch/pmml\",\"user-home/.scripts/common-helpers/saas\",\"user-home/_global_/python-2.7\"] }\r\n"
]
}
],
"source": [
"!cat /user-home/_global_/.remote-images/dsx-hi/dsx-scripted-ml-python2.json"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### Create Session Properties\n",
"Using values from `dsx-scripted-ml-python2.json`, we'll need to:\n",
"\n",
"- (1) Pull the archive from HDFS to the Yarn Distributed cache using spark conf **--archives**\n",
"- (2) Override the default PYSPARK_PYTHON, from the relative path `scriptCommand`\n",
"\n",
"---\n",
"\n",
"Example DSX_HI Properties for using dsx-scripted-ml-python2.tar.gz Virtual Environment:\n",
"```\n",
"{\"proxyUser\": \"user1\", \"archives\": [\"/user/dsxhi/environments/26611bf7fe595f786139d6d2132de070fc813f6a0ef7a4e25857b79c8cd4b565/dsx-scripted-ml-python2.tar.gz\"],\"conf\":{\"spark.yarn.appMasterEnv.PYSPARK_PYTHON\":\"dsx-scripted-ml-python2.tar.gz/anaconda2/bin/python\"}}\n",
"```\n",
"### Files currently on HDFS:\n",
"```\n",
"/user/dsxhi/environments/26611bf7fe595f786139d6d2132de070fc813f6a0ef7a4e25857b79c8cd4b565/dsx-scripted-ml-python2.tar.gz\n",
"/user/dsxhi/environments/pythonAddons/pythonAddons.tar.gz\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a7cf4dfb29f849fc9558c1be1c822164",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"MagicsControllerWidget(children=(Tab(children=(ManageSessionWidget(children=(HTML(value=u'
'), HTML(value=…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Added endpoint https://zinc1.fyre.ibm.com:8443/gateway/mjoudsx336-master-1/livy2/v1\n",
"Starting Spark application\n"
]
},
{
"data": {
"text/html": [
"
ID | YARN Application ID | Kind | State | Spark UI | Driver log | Current session? |
---|---|---|---|---|---|---|
913 | application_1533478912530_0775 | pyspark | idle | Link | Link | ✔ |