{ "metadata": { "kernelspec": { "name": "sparkrkernel", "display_name": "Spark | R" }, "language_info": { "name": "sparkR", "mimetype": "text/x-rsrc", "codemirror_mode": "text/x-rsrc", "pygments_lexer": "r" } }, "nbformat_minor": 2, "nbformat": 4, "cells": [ { "cell_type": "markdown", "source": [ "# Using MsSQLSpark Conenctor from SparkR\r\n", "\r\n", "The following notebook demostrates usage of MSSQLSpark connector in sparkR. For full set of capability supported by MSSQLSpark Connector refer **mssql_spark_connector_non_ad_pyspark.ipynb** " ], "metadata": { "azdata_cell_guid": "f4f5a3ce-b06b-4bcd-be5d-ae7e2fcd6108" } }, { "cell_type": "markdown", "source": [ "## PreReq\r\n", "-------\r\n", "- Download [AdultCensusIncome.csv]( https://amldockerdatasets.azureedge.net/AdultCensusIncome.csv ) to your local machine. Upload this file to hdfs folder named *spark_data*. \r\n", "- The sample uses a SQL database *connector_test_db*, user *connector_user* with password *password123!#* and datasource *connector_ds*. The database, user/password and datasource need to be created before running the full sample. Refer **data-virtualization/mssql_spark_connector_user_creation.ipynb** on steps to create this user." ], "metadata": { "azdata_cell_guid": "820bc224-56e1-4867-9df8-ddd9c3aa3a93" } }, { "cell_type": "markdown", "source": [ "## Load a CSV file to a dataframe" ], "metadata": { "azdata_cell_guid": "29e3c7f0-b2d0-4456-949d-b80dc6857486" } }, { "cell_type": "code", "source": [ "people <- read.df(\"/spark_data/AdultCensusIncome.csv\", \"csv\")\r\n", "head(people)" ], "metadata": { "azdata_cell_guid": "6be47cc0-0a41-4fbe-b43d-bf363e3c4c0a" }, "outputs": [ { "name": "stdout", "text": "Starting Spark application\n", "output_type": "stream" }, { "data": { "text/plain": "", "text/html": "\n
IDYARN Application IDKindStateSpark UIDriver logCurrent session?
5application_1572997761827_0009sparkridleLinkLink
" }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "text": "SparkSession available as 'spark'.\n", "output_type": "stream" }, { "name": "stdout", "text": " _c0 _c1 _c2 _c3 _c4 _c5\n1 age workclass fnlwgt education education-num marital-status\n2 39 State-gov 77516 Bachelors 13 Never-married\n3 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse\n4 38 Private 215646 HS-grad 9 Divorced\n5 53 Private 234721 11th 7 Married-civ-spouse\n6 28 Private 338409 Bachelors 13 Married-civ-spouse\n _c6 _c7 _c8 _c9 _c10 _c11\n1 occupation relationship race sex capital-gain capital-loss\n2 Adm-clerical Not-in-family White Male 2174 0\n3 Exec-managerial Husband White Male 0 0\n4 Handlers-cleaners Not-in-family White Male 0 0\n5 Handlers-cleaners Husband Black Male 0 0\n6 Prof-specialty Wife Black Female 0 0\n _c12 _c13 _c14\n1 hours-per-week native-country income\n2 40 United-States <=50K\n3 13 United-States <=50K\n4 40 United-States <=50K\n5 40 United-States <=50K\n6 40 Cuba <=50K", "output_type": "stream" } ], "execution_count": 3 }, { "cell_type": "markdown", "source": [ "## User MSQL Spark connector to save the dataframe as a table in SQL Server" ], "metadata": { "azdata_cell_guid": "f8304a0e-e3ef-4666-8f15-c1abddac9b25" } }, { "cell_type": "code", "source": [ "#Using deafault JDBC connector\r\n", "#Using MSSQLSpark connector\r\n", "dbname = \"connector_test_db\"\r\n", "url = paste(\"jdbc:sqlserver://master-0.master-svc;databaseName=\", \"connector_test_db\", sep=\"\")\r\n", "print(url)\r\n", "\r\n", "dbtable = \"AdultCensus_test_sparkr\"\r\n", "user = \"connector_user\"\r\n", "password = \"password123!#\" # Please specify password here\r\n", "\r\n", "saveDF(people,\r\n", " dbtable,\r\n", " source = \"com.microsoft.sqlserver.jdbc.spark\", \r\n", " url = url, \r\n", " dbtable=dbtable, \r\n", " mode = \"overwrite\",\r\n", " user=user, \r\n", " password=password)\r\n", "" ], "metadata": { "azdata_cell_guid": "7216f3a5-0439-44a8-a726-f393db2d612b" }, "outputs": [ { "name": "stdout", "text": "[1] \"jdbc:sqlserver://master-0.master-svc;databaseName=connector_test_db\"", "output_type": "stream" } ], "execution_count": 5 } ] }