{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Nodebooks: Introducing Node.js Data Science Notebooks\n", "\n", "Notebooks are where data scientists process, analyse, and visualise data in an iterative, collaborative environment. They typically run environments for languages like Python, R, and Scala. For years, data science notebooks have served academics and research scientists as a scratchpad for writing code, refining algorithms, and sharing and proving their work. Today, it's a workflow that lends itself well to web developers experimenting with data sets in Node.js.\n", "\n", "To that end, pixiedust_node is an add-on for Jupyter notebooks that allows Node.js/JavaScript to run inside notebook cells. Not only can web developers use the same workflow for collaborating in Node.js, but they can also use the same tools to work with existing data scientists coding in Python.\n", "\n", "pixiedust_node is built on the popular PixieDust helper library. Let’s get started.\n", "\n", "> Note: Run one cell at a time or unexpected results might be observed.\n", "\n", "\n", "## Part 1: Variables, functions, and promises\n", "\n", "\n", "### Installing\n", "Install the [`pixiedust`](https://pypi.python.org/pypi/pixiedust) and [`pixiedust_node`](https://pypi.python.org/pypi/pixiedust-node) packages using `pip`, the Python package manager. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already up-to-date: pixiedust in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages\n", "Requirement not upgraded as not directly required: lxml in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust)\n", "Requirement not upgraded as not directly required: geojson in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust)\n", "Requirement not upgraded as not directly required: colour in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust)\n", "Requirement not upgraded as not directly required: mpld3 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust)\n", "Requirement not upgraded as not directly required: astunparse in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust)\n", "Requirement not upgraded as not directly required: markdown in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust)\n", "Requirement not upgraded as not directly required: six<2.0,>=1.6.1 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from astunparse->pixiedust)\n", "Requirement not upgraded as not directly required: wheel<1.0,>=0.23.0 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from astunparse->pixiedust)\n", "Requirement already up-to-date: pixiedust_node in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages\n", "Requirement not upgraded as not directly required: pixiedust in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust_node)\n", "Requirement not upgraded as not directly required: pandas in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust_node)\n", "Requirement not upgraded as not directly required: ipython in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust_node)\n", "Requirement not upgraded as not directly required: lxml in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node)\n", "Requirement not upgraded as not directly required: geojson in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node)\n", "Requirement not upgraded as not directly required: colour in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node)\n", "Requirement not upgraded as not directly required: mpld3 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node)\n", "Requirement not upgraded as not directly required: astunparse in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node)\n", "Requirement not upgraded as not directly required: markdown in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node)\n", "Requirement not upgraded as not directly required: python-dateutil in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pandas->pixiedust_node)\n", "Requirement not upgraded as not directly required: pytz>=2011k in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pandas->pixiedust_node)\n", "Requirement not upgraded as not directly required: numpy>=1.9.0 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pandas->pixiedust_node)\n", "Requirement not upgraded as not directly required: setuptools>=18.5 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: decorator in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: pickleshare in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: simplegeneric>0.8 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: traitlets>=4.2 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: prompt_toolkit<2.0.0,>=1.0.4 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: pygments in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: pexpect in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: backports.shutil_get_terminal_size in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: pathlib2 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: six<2.0,>=1.6.1 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from astunparse->pixiedust->pixiedust_node)\n", "Requirement not upgraded as not directly required: wheel<1.0,>=0.23.0 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from astunparse->pixiedust->pixiedust_node)\n", "Requirement not upgraded as not directly required: ipython_genutils in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from traitlets>=4.2->ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: enum34 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from traitlets>=4.2->ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: wcwidth in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from prompt_toolkit<2.0.0,>=1.0.4->ipython->pixiedust_node)\n", "Requirement not upgraded as not directly required: scandir in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pathlib2->ipython->pixiedust_node)\n" ] } ], "source": [ "# install or upgrade the packages\n", "# restart the kernel to pick up the latest version\n", "!pip install pixiedust --upgrade\n", "!pip install pixiedust_node --upgrade" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using pixiedust_node\n", "Now we can import `pixiedust_node` into our notebook:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pixiedust database opened successfully\n" ] }, { "data": { "text/html": [ "\n", "
\n", " \n", " \n", " \n", " Pixiedust version 1.1.11\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " \n", " \n", " \n", " Pixiedust Node.js \n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "pixiedust_node 0.2.5 started. Cells starting '%%node' may contain Node.js code.\n" ] } ], "source": [ "import pixiedust_node" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And then we can write JavaScript code in cells whose first line is `%%node`:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "%%node\n", "// get the current date\n", "var date = new Date();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It’s that easy! We can have Python and Node.js in the same notebook. Cells are Python by default, but simply starting a cell with `%%node` indicates that the next lines will be JavaScript." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Displaying HTML and images in notebook cells\n", "We can use the `html` function to render HTML code in a cell:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Quote

\"Imagination is more important than knowledge\"\n", "Albert Einstein
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%node\n", "var str = '

Quote

\"Imagination is more important than knowledge\"\\nAlbert Einstein
';\n", "html(str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we have an image we want to render, we can do that with the `image` function:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%node\n", "var url = 'https://github.com/IBM/nodejs-in-notebooks/blob/master/notebooks/images/pixiedust_node_schematic.png?raw=true';\n", "image(url);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Printing JavaScript variables\n", "\n", "Print variables using `console.log`." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{ a: 1, b: 'two', c: true }\n" ] } ], "source": [ "%%node\n", "var x = { a:1, b:'two', c: true };\n", "console.log(x);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calling the `print` function within your JavaScript code is the same as calling `print` in your Python code." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"a\": 3, \"c\": false, \"b\": \"four\"}\n" ] } ], "source": [ "%%node\n", "var y = { a:3, b:'four', c: false };\n", "print(y);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizing data using PixieDust\n", "You can also use PixieDust’s `display` function to render data graphically. Configuring the output as line chart, the visualization looks as follows: " ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "pixiedust": { "displayParams": { "aggregation": "SUM", "chartsize": "99", "handlerId": "lineChart", "keyFields": "x", "rowCount": "500", "valueFields": "cos,sin" } } }, "outputs": [], "source": [ "%%node\n", "var data = [];\n", "for (var i = 0; i < 1000; i++) {\n", " var x = 2*Math.PI * i/ 360;\n", " var obj = {\n", " x: x,\n", " i: i,\n", " sin: Math.sin(x),\n", " cos: Math.cos(x),\n", " tan: Math.tan(x)\n", " };\n", " data.push(obj);\n", "}\n", "// render data \n", "display(data);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "PixieDust presents visualisations of DataFrames using Matplotlib, Bokeh, Brunel, d3, Google Maps and, MapBox. No code is required on your part because PixieDust presents simple pull-down menus and a friendly point-and-click interface, allowing you to configure how the data is presented:\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding npm modules\n", "There are thousands of libraries and tools in the npm repository, Node.js’s package manager. It’s essential that we can install npm libraries and use them in our notebook code.\n", "Let’s say we want to make some HTTP calls to an external API service. We could deal with Node.js’s low-level HTTP library, or an easier option would be to use the ubiquitous `request` npm module.\n", "Once we have pixiedust_node set up, installing an npm module is as simple as running `npm.install` in a Python cell:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/opt/conda/envs/DSX-Python27/bin/npm install -s request\n", "+ request@2.87.0\n", "updated 1 package in 0.856s\n" ] } ], "source": [ "npm.install('request');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once installed, you may `require` the module in your JavaScript code:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "... ... ... ...\n", "... ...\n", "{ iss_position: { longitude: '42.2119', latitude: '51.1124' },\n", "timestamp: 1531779053,\n", "message: 'success' }\n" ] } ], "source": [ "%%node\n", "var request = require('request');\n", "var r = {\n", " method:'GET',\n", " url: 'http://api.open-notify.org/iss-now.json',\n", " json: true\n", "};\n", "request(r, function(err, req, body) {\n", " console.log(body);\n", "});\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As an HTTP request is an asynchronous action, the `request` library calls our callback function when the operation has completed. Inside that function, we can call print to render the data.\n", "We can organise our code into functions to encapsulate complexity and make it easier to reuse code. We can create a function to get the current position of the International Space Station in one notebook cell:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "... ..... ..... ..... ..... ... ..... ..... ....... ....... ....... ....... ....... ..... ..... ...\n" ] } ], "source": [ "%%node\n", "var request = require('request');\n", "var getPosition = function(callback) {\n", " var r = {\n", " method:'GET',\n", " url: 'http://api.open-notify.org/iss-now.json',\n", " json: true\n", " };\n", " request(r, function(err, req, body) {\n", " var obj = null;\n", " if (!err) {\n", " obj = body.iss_position\n", " obj.latitude = parseFloat(obj.latitude);\n", " obj.longitude = parseFloat(obj.longitude);\n", " obj.time = new Date().getTime(); \n", " }\n", " callback(err, obj);\n", " });\n", "};" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And use it in another cell:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "... ...\n", "{ longitude: 42.9493, latitude: 51.1819, time: 1531779061073 }\n" ] } ], "source": [ "%%node\n", "getPosition(function(err, data) {\n", " console.log(data);\n", "});" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Promises\n", "If you prefer to work with JavaScript Promises when writing asynchronous code, then that’s okay too. Let’s rewrite our `getPosition` function to return a Promise. First we're going to install the `request-promise` module from npm:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/opt/conda/envs/DSX-Python27/bin/npm install -s request request-promise\n", "+ request@2.87.0\n", "+ request-promise@4.2.2\n", "updated 2 packages in 0.912s\n" ] } ], "source": [ "npm.install( ('request', 'request-promise') )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how you can install multiple modules in a single call. Just pass in a Python `list` or `tuple`.\n", "Then we can refactor our function a little:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "... ..... ..... ..... ..... ... ..... ..... ..... ..... ..... ..... ..... ...\n" ] } ], "source": [ "%%node\n", "var request = require('request-promise');\n", "var getPosition = function(callback) {\n", " var r = {\n", " method:'GET',\n", " url: 'http://api.open-notify.org/iss-now.json',\n", " json: true\n", " };\n", " return request(r).then(function(body) {\n", " var obj = null;\n", " obj = body.iss_position;\n", " obj.latitude = parseFloat(obj.latitude);\n", " obj.longitude = parseFloat(obj.longitude);\n", " obj.time = new Date().getTime(); \n", " return obj;\n", " });\n", "};" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And call it in the Promises style:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "... ... ... ...\n", "{ longitude: 44.0843, latitude: 51.2787, time: 1531779072259 }\n" ] } ], "source": [ "%%node\n", "getPosition().then(function(data) {\n", " console.log(data);\n", "}).catch(function(err) {\n", " console.error(err); \n", "});" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or call it in a more compact form:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{ longitude: 44.6288, latitude: 51.3208, time: 1531779077984 }\n" ] } ], "source": [ "%%node\n", "getPosition().then(console.log).catch(console.error);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next part of this notebook we'll illustrate how you can access local and remote data sources from within the notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "# Part 2: Working with data sources\n", "\n", "You can access any data source using your favorite public or home-grown packages. In the second part of this notebook you'll learn how to retrieve data from an Apache CouchDB (or Cloudant) database and visualize it using PixieDust or third-party libraries.\n", "\n", "## Accessing Cloudant data sources\n", "\n", "\n", "To access data stored in an Apache CouchDB or Cloudant database, we can use the [`cloudant-quickstart`](https://www.npmjs.com/package/cloudant-quickstart) npm module:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/opt/conda/envs/DSX-Python27/bin/npm install -s cloudant-quickstart\n", "+ cloudant-quickstart@1.25.5\n", "updated 1 package in 0.983s\n" ] } ], "source": [ "npm.install('cloudant-quickstart')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With our Cloudant URL, we can start exploring the data in Node.js. First we make a connection to the remote Cloudant database:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "%%node\n", "// connect to Cloudant using cloudant-quickstart\n", "const cqs = require('cloudant-quickstart');\n", "const cities = cqs('https://56953ed8-3fba-4f7e-824e-5498c8e1d18e-bluemix.cloudant.com/cities');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> For this code pattern example a remote database has been pre-configured to accept anonymous connection requests. If you wish to explore the `cloudant-quickstart` library beyond what is covered in this nodebook, we recommend you create your own replica and replace above URL with your own, e.g. `https://myid:mypassword@mycloudanthost/mydatabase`.\n", "\n", "Now we have an object named `cities` that we can use to access the database. \n", "\n", "### Exploring the data using Node.js in a notebook \n", "\n", "We can retrieve all documents using `all`." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ { _id: '1000501',\n", "name: 'Grahamstown',\n", "latitude: -33.30422,\n", "longitude: 26.53276,\n", "country: 'ZA',\n", "population: 91548,\n", "timezone: 'Africa/Johannesburg' },\n", "{ _id: '1000543',\n", "name: 'Graaff-Reinet',\n", "latitude: -32.25215,\n", "longitude: 24.53075,\n", "country: 'ZA',\n", "population: 62896,\n", "timezone: 'Africa/Johannesburg' },\n", "{ _id: '100077',\n", "name: 'Abū Ghurayb',\n", "latitude: 33.30563,\n", "longitude: 44.18477,\n", "country: 'IQ',\n", "population: 900000,\n", "timezone: 'Asia/Baghdad' } ]\n" ] } ], "source": [ "%%node\n", "// If no limit is specified, 100 documents will be returned\n", "cities.all({limit:3}).then(console.log).catch(console.error)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Specifying the optional `limit` and `skip` parameters we can paginate through the document list:\n", "\n", "```\n", "cities.all({limit:10}).then(console.log).catch(console.error)\n", "cities.all({skip:10, limit:10}).then(console.log).catch(console.error)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we know the IDs of documents, we can retrieve them singly:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{ _id: '2636749',\n", "name: 'Stowmarket',\n", "latitude: 52.18893,\n", "longitude: 0.99774,\n", "country: 'GB',\n", "population: 15394,\n", "timezone: 'Europe/London' }\n" ] } ], "source": [ "%%node\n", "cities.get('2636749').then(console.log).catch(console.error);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or in bulk:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ { _id: '5913490',\n", "name: 'Calgary',\n", "latitude: 51.05011,\n", "longitude: -114.08529,\n", "country: 'CA',\n", "population: 1019942,\n", "timezone: 'America/Edmonton' },\n", "{ _id: '4140963',\n", "name: 'Washington, D.C.',\n", "latitude: 38.89511,\n", "longitude: -77.03637,\n", "country: 'US',\n", "population: 601723,\n", "timezone: 'America/New_York' },\n", "{ _id: '3520274',\n", "name: 'Río Blanco',\n", "latitude: 18.83036,\n", "longitude: -97.156,\n", "country: 'MX',\n", "population: 39543,\n", "timezone: 'America/Mexico_City' } ]\n" ] } ], "source": [ "%%node\n", "cities.get(['5913490', '4140963','3520274']).then(console.log).catch(console.error);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of just calling `print` to output the JSON, we can bring PixieDust's `display` function to bear by passing it an array of data to visualize. Using mapbox as renderer and satelite as basemap, we can display the location and population of the selected cities: " ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "pixiedust": { "displayParams": { "basemap": "satellite-v9", "chartsize": "76", "coloropacity": "53", "colorrampname": "Orange to Purple", "handlerId": "mapView", "keyFields": "latitude,longitude", "kind": "simple-cluster", "legend": "false", "mapboxtoken": "pk.eyJ1IjoibWFwYm94IiwiYSI6ImNpejY4M29iazA2Z2gycXA4N2pmbDZmangifQ.-g_vE53SD2WrJ6tFX7QHmA", "rendererId": "mapbox", "rowCount": "500", "valueFields": "population,name" } }, "scrolled": false }, "outputs": [], "source": [ "%%node\n", "cities.get(['5913490', '4140963','3520274']).then(display).catch(console.error);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "We can also query a subset of the data using the `query` function, passing it a [Cloudant Query](https://cloud.ibm.com/docs/services/Cloudant/api/cloudant_query.html#query) statement. Using mapbox as renderer, the customizable output looks as follows:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "pixiedust": { "displayParams": { "basemap": "outdoors-v9", "colorrampname": "Yellow to Blue", "handlerId": "mapView", "keyFields": "latitude,longitude", "mapboxtoken": "pk.eyJ1IjoibWFwYm94IiwiYSI6ImNpejY4M29iazA2Z2gycXA4N2pmbDZmangifQ.-g_vE53SD2WrJ6tFX7QHmA", "rowCount": "500", "valueFields": "name,population" } }, "scrolled": false }, "outputs": [], "source": [ "%%node\n", "// fetch cities in UK above latitude 54 degrees north\n", "cities.query({country:'GB', latitude: { \"$gt\": 54}}).then(display).catch(console.error);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### Aggregating data\n", "The `cloudant-quickstart` library also allows aggregations (sum, count, stats) to be performed in the Cloudant database.\n", "Let’s calculate the sum of the population field:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2694222973\n", "\n" ] } ], "source": [ "%%node\n", "cities.sum('population').then(console.log).catch(console.error);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or compute the sum of the `population`, grouped by the `country` field, displaying 10 countries with the largest population:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "pixiedust": { "displayParams": { "aggregation": "SUM", "handlerId": "barChart", "keyFields": "name", "mapboxtoken": "pk.eyJ1IjoibWFwYm94IiwiYSI6ImNpejY4M29iazA2Z2gycXA4N2pmbDZmangifQ.-g_vE53SD2WrJ6tFX7QHmA", "orientation": "vertical", "rendererId": "google", "rowCount": "100", "sortby": "Values DESC", "valueFields": "population" } }, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "... ... ... ..... ..... ... ... ..... ..... ... ... ..... ..... ...\n", "CN 389,487,480\n", "IN 269,553,896\n", "US 190,515,768\n", "BR 125,426,547\n", "RU 108,885,695\n", "JP 99,000,238\n", "MX 80,474,387\n", "ID 63,161,801\n", "DE 58,884,999\n", "TR 55,733,719\n" ] } ], "source": [ "%%node\n", "\n", "// helper function\n", "function top10(data) {\n", " // convert input data structure to array\n", " var pop_array = [];\n", " Object.keys(data).forEach(function(n,k) {\n", " pop_array.push({name: n, population: data[n]});\n", " });\n", " // sort array by population in descending order\n", " pop_array.sort(function(a,b) {\n", " return b.population - a.population; \n", " });\n", " // display top 10 entries\n", " pop_array.slice(0,10).forEach(function(e) {\n", " console.log(e.name + ' ' + e.population.toLocaleString()); \n", " });\n", "}\n", "\n", "// fetch aggregated data and invoke helper routine\n", "cities.sum('population','country').then(top10).catch(console.error);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `cloudant-quickstart` package is just one of several Node.js libraries that you can use to access Apache CouchDB or Cloudant. Follow [this link](https://medium.com/ibm-watson-data-lab/choosing-a-cloudant-library-d14c06f3d714) to learn more about your options. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizing data using custom charts\n", "\n", "If you prefer, you can also use third-party Node.js charting packages to visualize your data, such as [`quiche`](https://www.npmjs.com/package/quiche)." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/opt/conda/envs/DSX-Python27/bin/npm install -s quiche\n", "+ quiche@0.3.0\n", "updated 1 package in 0.957s\n" ] } ], "source": [ "npm.install('quiche');" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "... ... ... ... ... ... ... ... ...\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%node\n", "var Quiche = require('quiche');\n", "var pie = new Quiche('pie');\n", "\n", "// fetch cities in UK\n", "cities.query({name: 'Cambridge'}).then(function(data) {\n", "\n", " var colors = ['ff00ff','0055ff', 'ff0000', 'ffff00', '00ff00','0000ff'];\n", " for(i in data) {\n", " var city = data[i];\n", " pie.addData(city.population, city.name + '(' + city.country +')', colors[i]);\n", " }\n", " var imageUrl = pie.getUrl(true);\n", " image(imageUrl); \n", "});" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "# Part 3: Sharing data between Python and Node.js cells\n", "\n", "You can share variables between Python and Node.js cells. Why woud you want to do that? Read on.\n", "\n", "The Node.js library ecosystem is extensive. Perhaps you need to fetch data from a database and prefer the syntax of a particular Node.js npm module. You can use Node.js to fetch the data, move it to the Python environment, and convert it into a Pandas or Spark DataFrame for aggregation, analysis and visualisation.\n", "\n", "PixieDust and pixiedust_node give you the flexibility to mix and match Python and Node.js code to suit the workflow you are building and the skill sets you have in your team.\n", "\n", "Mixing Node.js and Python code in the same notebook is a great way to integrate the work of your software development and data science teams to produce a collaborative report or dashboard.\n", "\n", "\n", "### Sharing data\n", "\n", "Define variables in a Python cell." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# define a couple variables in Python\n", "a = 'Hello from Python!'\n", "b = 2\n", "c = False\n", "d = {'x':1, 'y':2}\n", "e = 3.142\n", "f = [{'a':1}, {'a':2}, {'a':3}]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Access or modify their values in Node.js cells." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello from Python! 2 false { y: 2, x: 1 } 3.142 [ { a: 1 }, { a: 2 }, { a: 3 } ]\n" ] } ], "source": [ "%%node\n", "// print variable values\n", "console.log(a, b, c, d, e, f);\n", "\n", "// change variable value \n", "a = 'Hello from Node.js!';\n", "\n", "// define a new variable\n", "var g = 'Yes, it works both ways.';" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspect the manipulated data." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello from Node.js! Yes, it works both ways.\n" ] } ], "source": [ "# display modified variable and the new variable\n", "print('{} {}'.format(a,g))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** PixieDust natively supports [data sharing between Python and Scala](https://ibm-watson-data-lab.github.io/pixiedust/scalabridge.html), extending the loop for some data types:\n", " ```\n", " %%scala\n", " println(a,b,c,d,e,f,g)\n", " \n", " (Hello from Node.js!,2,null,null,null,null,Yes, it works both ways.)\n", " ```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sharing data from an asynchronous callback\n", "\n", "If you wish transfer data from Node.js to Python from an asynchronous callback, make sure you write the data to a global variable. \n", "\n", "Load a csv file from a GitHub repository." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "... ... ...\n", "Fetched sample data from GitHub.\n" ] } ], "source": [ "%%node\n", "\n", "// global variable\n", "var sample_csv_data = '';\n", "\n", "// load csv file from GitHub and store data in the global variable\n", "request.get('https://github.com/ibm-watson-data-lab/open-data/raw/master/cars/cars.csv').then(function(data) {\n", " sample_csv_data = data;\n", " console.log('Fetched sample data from GitHub.');\n", "});" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a Pandas DataFrame from the downloaded data." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "pixiedust": { "displayParams": {} } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersenginehorsepowerweightaccelerationyearoriginname
018.08307.0130350412.070Americanchevrolet chevelle malibu
115.08350.0165369311.570Americanbuick skylark 320
218.08318.0150343611.070Americanplymouth satellite
316.08304.0150343312.070Americanamc rebel sst
417.08302.0140344910.570Americanford torino
\n", "
" ], "text/plain": [ " mpg cylinders engine horsepower weight acceleration year origin \\\n", "0 18.0 8 307.0 130 3504 12.0 70 American \n", "1 15.0 8 350.0 165 3693 11.5 70 American \n", "2 18.0 8 318.0 150 3436 11.0 70 American \n", "3 16.0 8 304.0 150 3433 12.0 70 American \n", "4 17.0 8 302.0 140 3449 10.5 70 American \n", "\n", " name \n", "0 chevrolet chevelle malibu \n", "1 buick skylark 320 \n", "2 plymouth satellite \n", "3 amc rebel sst \n", "4 ford torino " ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import io\n", "# create DataFrame from shared csv data\n", "pandas_df = pd.read_csv(io.StringIO(sample_csv_data))\n", "# display first five rows\n", "pandas_df.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: Above example is for illustrative purposes only. A much easier solution is to use [PixieDust's sampleData method](https://ibm-watson-data-lab.github.io/pixiedust/loaddata.html#load-a-csv-using-its-url) if you want to create a DataFrame from a URL. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### References:\n", " * [Nodebooks: Introducing Node.js Data Science Notebooks](https://medium.com/ibm-watson-data-lab/nodebooks-node-js-data-science-notebooks-aa140bea21ba)\n", " * [Nodebooks: Sharing Data Between Node.js & Python](https://medium.com/ibm-watson-data-lab/nodebooks-sharing-data-between-node-js-python-3a4acae27a02)\n", " * [Sharing Variables Between Python & Node.js in Jupyter Notebooks](https://medium.com/ibm-watson-data-lab/sharing-variables-between-python-node-js-in-jupyter-notebooks-682a79d4bdd9)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2.7", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" } }, "nbformat": 4, "nbformat_minor": 2 }