{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "hvPlot provides one API to explore data of many different types. Previous sections have exclusively worked with tabular data stored in pandas (or pandas-like) DataFrames. The other most common type of data are n-dimensional arrays. hvPlot aims to eventually support different array libraries but for now focuses on [xarray](http://xarray.pydata.org/en/stable/). XArray provides a convenient and very powerful wrapper to label the axis and coordinates of multi-dimensional (n-D) arrays. This user guide will cover how to leverage ``xarray`` and ``hvplot`` to visualize and explore data of different dimensionality ranging from simple 1D data, to 2D image-like data, to multi-dimensional cubes of data.\n",
    "\n",
    "For these examples we’ll use the North American air temperature dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import xarray as xr\n",
    "import hvplot.xarray  # noqa\n",
    "\n",
    "air_ds = xr.tutorial.open_dataset('air_temperature').load()\n",
    "air = air_ds.air\n",
    "air_ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1D Plots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Selecting the data at a particular lat/lon coordinate we get a 1D dataset of air temperatures over time:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air1d = air.sel(lat=40, lon=285)\n",
    "air1d.hvplot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice how the axes are already appropriately labeled, because xarray stores the metadata required. We can also further subselect the data and use `*` to overlay plots:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air1d_sel = air1d.sel(time='2013-01')\n",
    "air1d_sel.hvplot(color='purple') * air1d_sel.hvplot.scatter(marker='o', color='blue', size=15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air.lat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting multiple\n",
    "\n",
    "If we select multiple coordinates along one axis and plot a chart type, the data will automatically be split by the coordinate:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air.sel(lat=[20, 40, 60], lon=285).hvplot.line()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To plot a different relationship we can explicitly request to display the latitude along the y-axis and use the ``by`` keyword to color each longitude (or 'lon') differently (note that this differs from the ``hue`` keyword xarray uses):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air.sel(time='2013-02-01 00:00', lon=[280, 285]).hvplot.line(y='lat', by='lon', legend='top_right')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2D Plots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default the ``DataArray.hvplot()`` method generates an image if the data is two-dimensional."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air2d = air.sel(time='2013-06-01 12:00')\n",
    "air2d.hvplot(width=400)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively we can also plot the same data using the ``contour`` and ``contourf`` methods, which provide a ``levels`` argument to control the number of iso-contours to draw:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air2d.hvplot.contour(width=400, levels=20) + air2d.hvplot.contourf(width=400, levels=8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## n-D Plots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the data has more than two dimensions it will default to a histogram without providing it further hints:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air.hvplot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However we can tell it to apply a ``groupby`` along a particular dimension, allowing us to explore the data as images along that dimension with a slider:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air.hvplot(groupby='time', width=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, for numeric types you'll get a slider and for non-numeric types you'll get a selector. Use ``widget_type`` and ``widget_location`` to control the look of the widget. To learn more about customizing widget behavior see [Widgets](Widgets.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air.hvplot(groupby='time', width=600, widget_type='scrubber', widget_location='bottom')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we pick a different, lower dimensional plot type (such as a 'line') it will automatically apply a groupby over the remaining dimensions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air.hvplot.line(width=600)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Statistical plots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Statistical plots such as histograms, kernel-density estimates, or violin and box-whisker plots aggregate the data across one or more of the coordinate dimensions. For instance, plotting a KDE provides a summary of all the air temperature values but we can, once again, use the ``by`` keyword to view each selected latitude (or 'lat') separately:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air.sel(lat=[25, 50, 75]).hvplot.kde('air', by='lat', alpha=0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the ``by`` keyword we can break down the distribution of the air temperature across one or more variables:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air.hvplot.violin('air', by='lat', color='lat', cmap='Category20')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Rasterizing\n",
    "\n",
    "If you are plotting a large amount of data at once, you can consider using the hvPlot interface to [Datashader](http://datashader.org), which can be enabled simply by setting `rasterize=True`.\n",
    "\n",
    "Note that by declaring that the data should not be grouped by another coordinate variable, i.e. by setting `groupby=[]`, we can plot all the datapoints, showing us the spread of air temperatures in the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "air.hvplot.scatter('time', groupby=[], rasterize=True) *\\\n",
    "air.mean(['lat', 'lon']).hvplot.line('time', color='indianred')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we also overlaid a non-datashaded line plot of the average temperature at each time.  If you enable the appropriate hover tool, the overlaid data supports hovering and zooming even in a static export such as on a web server or in an email, while the raw-data plot has been aggregated spatially before it is sent to the browser, and thus it has only the fixed spatial binning available at that time.  If you have a live Python process, the raw data will be aggregated each time you pan or zoom, letting you see the entire dataset regardless of size."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}