{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using `perspective.Table`\n",
    "\n",
    "This notebook provides an overview of `perspective.Table`, Perspective's core component that allows for lightning-fast data loading, query, update, and transformation.\n",
    "\n",
    "Tables can be used alone to manage datasets, or connected for data to flow quickly between multiple tables. Outside of a Jupyter context, Perspective tables can be used to create efficient [servers](https://github.com/finos/perspective/tree/master/examples/tornado-python) which allow data to be hosted and viewed by clients in the browser using Perspective's Javascript library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from perspective import Table\n",
    "from datetime import date, datetime\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import requests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Supported Data Formats\n",
    "\n",
    "Perspective supports 6 core data types: `int`, `float`, `str`, `bool`, `datetime`, and `date`, and several data formats:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Pandas DataFrames\n",
    "df = pd.DataFrame({\n",
    "    \"a\": np.arange(0, 2),\n",
    "    \"b\": np.array([\"a\", \"b\"], dtype=object),\n",
    "    \"nullable\": [1.5, np.nan], # perspective handles `None` and `np.nan` values\n",
    "    \"mixed\": [None, 1]\n",
    "})\n",
    "\n",
    "# Column-oriented\n",
    "data = {\n",
    "    \"int\": [i for i in range(4)],\n",
    "    \"float\": [i * 1.25 for i in range(4)],\n",
    "    \"str\": [\"a\", \"b\", \"c\", \"d\"],\n",
    "    \"bool\": [True, False, True, False],\n",
    "    \"date\": [date.today() for i in range(4)],\n",
    "    \"datetime\": [datetime.now() for i in range(4)]\n",
    "}\n",
    "\n",
    "# Row-oriented\n",
    "rows = [{\"a\": 1, \"b\": True}, {\"a\": 2, \"b\": False}]\n",
    "\n",
    "# CSV strings\n",
    "csv = df.to_csv()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Schemas\n",
    "\n",
    "To explicitly specify data types for columns, create a schema (a `dict` of `str` column names to data types):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "schema = {\n",
    "    \"int\": float,\n",
    "    \"float\": int,\n",
    "    \"str\": str,\n",
    "    \"bool\": bool,\n",
    "    \"date\": datetime,\n",
    "    \"datetime\": datetime\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating a Table\n",
    "\n",
    "A Table can be created by passing in a dataset or a schema, like the ones created above:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# From a dataset\n",
    "table = Table(data)\n",
    "\n",
    "# Or a dataframe\n",
    "df_table = Table(df)\n",
    "\n",
    "# Or a CSV\n",
    "csv_table = Table(csv)\n",
    "\n",
    "# tables can be created from schema\n",
    "table2 = Table(schema)\n",
    "assert table2.size() == 0\n",
    "\n",
    "# constructing a table with an index, which is a column name to be used as the primary key\n",
    "indexed = Table(data, index=\"str\")\n",
    "\n",
    "# or a limit, which is a total cap on the number of rows in the table - updates past `limit` overwite at row 0\n",
    "limited = Table(data, limit=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using the Table\n",
    "\n",
    "A Table has several queryable properties:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# schema() returns a mapping of column names to data types\n",
    "display(\"Table schema:\", table.schema())\n",
    "\n",
    "# size() returns the number of rows in the table\n",
    "display(\"Table has {} rows\".format(table.size()))\n",
    "\n",
    "# columns() returns a List of the table's column names\n",
    "display(\"Table columns:\", table.columns())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Updating with new data\n",
    "\n",
    "To update or stream new data into the Table, call `table.update(data)`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# you can update all columns\n",
    "table.update(data)\n",
    "print(\"after update:\", table.size())\n",
    "\n",
    "# or however many you'd like\n",
    "table.update({\n",
    "    \"int\": [5, 6, 7],\n",
    "    \"str\": [\"x\", \"y\", \"z\"]\n",
    "})\n",
    "\n",
    "# but you cannot add new columns through updating - create a new Table instead\n",
    "try:\n",
    "    table.update({\n",
    "        \"abcd\": [1]\n",
    "    })\n",
    "except:\n",
    "    pass\n",
    "\n",
    "# updates on unindexed tables always append\n",
    "print(\"after append:\", table.size())\n",
    "\n",
    "# updates on indexed tables should include the primary key - the new data overwrites at the row specified by the primary key\n",
    "indexed.update([{\"str\": \"b\", \"int\": 100}])\n",
    "print(\"after indexed partial update:\", indexed.size())\n",
    "\n",
    "# without a primary key, the update appends to the end of the dataset\n",
    "indexed.update([{\"int\": 101}])\n",
    "print(\"after indexed append:\", indexed.size())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Queries and transformations using `View`\n",
    "\n",
    "`table.view()` allows you to apply various pivots, aggregates, sorts, filters, column selections, and expression computations on the Table, as well as return the results in a variety of output data formats.\n",
    "\n",
    "To create a view, call the `view()` method on an existing table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "view = table.view() # a view with zero transformations - returns the dataset as passed in\n",
    "\n",
    "# view metadata\n",
    "print(\"View has {} rows and {} columns\".format(view.num_rows(), view.num_columns()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Applying transformations\n",
    "\n",
    "To apply transformations, pass in the relevant `kwargs` into the constructor:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pivoted = table.view(row_pivots=[\"int\"], column_pivots=[\"str\"]) # group and split the underlying dataset\n",
    "\n",
    "aggregated = table.view(row_pivots=[\"int\"], aggregates={\"float\": \"avg\"}) # specify aggregations for individual columns\n",
    "\n",
    "subset = table.view(columns=[\"float\"]) # show only the columns you're interested in\n",
    "\n",
    "sorted_view = table.view(sort=[[\"str\", \"desc\"], [\"int\", \"asc\"]]) # sort on a specific column, or multiple columns\n",
    "\n",
    "filtered = table.view(filter=[[\"int\", \">\", 2]]) # filter the dataset on a specific value\n",
    "\n",
    "expressions = table.view(expressions=['\"int\" + \"float\" / 100']) # calculate arbitrary expressions over the dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Serializing Data\n",
    "\n",
    "Views are used to serialize data to the user in several formats:\n",
    "- `to_records`: outputs a list of dictionaries, each of which is a single row\n",
    "- `to_dict`: outputs a dictionary of lists, each string key the name of a column\n",
    "- `to_numpy`: outputs a dictionary of numpy arrays\n",
    "- `to_df`: outputs a `pandas.DataFrame`\n",
    "- `to_arrow`: outputs an Apache Arrow binary, which can be passed into another `perspective.Table` to create a copy of the first table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rows = view.to_records()\n",
    "\n",
    "columnar = view.to_dict()\n",
    "\n",
    "np_out = view.to_numpy()\n",
    "\n",
    "df_out = view.to_df()\n",
    "\n",
    "arrow_out = view.to_arrow()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Data from pivoted or otherwise transformed views reflect the state of the transformed dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filtered_df = filtered.to_df()\n",
    "filtered_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the table is updated with data, views are automatically notified of the updates:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "v1 = table.view()\n",
    "print(\"v1 has {} rows and {} columns\".format(v1.num_rows(), v1.num_columns()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "table.update({\"int\": [100, 200, 300, 400]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"v1 has {} rows and {} columns\".format(v1.num_rows(), v1.num_columns()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using callbacks to connect `Table` instances\n",
    "\n",
    "Custom callback functions can be applied on the `Table` and `View` instances.\n",
    "\n",
    "The most useful is `View.on_update`, which triggers a callback after the Table has been updated:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The `delta` property is an Arrow binary containing updated rows\n",
    "def callback(port_id, delta):\n",
    "    new_table = Table(delta)\n",
    "    display(new_table.view().to_dict())\n",
    "    \n",
    "table = Table(data)\n",
    "view = table.view()\n",
    "\n",
    "# Register the callback with `mode=\"row\"` to enable pushing back updated data\n",
    "view.on_update(callback, mode=\"row\")\n",
    "\n",
    "# Update will trigger the callback\n",
    "table.update({\n",
    "    \"int\": [1, 3],\n",
    "    \"str\": [\"abc\", \"def\"]\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because the callback can be triggered with a _copy_ of the updated data, `on_update` allows you to connect together multiple tables that all share state quickly and dependably:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a table and a view\n",
    "t1 = Table(data)\n",
    "v1 = t1.view()\n",
    "\n",
    "# And a new table that feeds from `t1`\n",
    "t2 = Table(t1.schema())\n",
    "\n",
    "# And a callback that updates `t2` whenever `t1` updates\n",
    "def cb(port_id, delta):\n",
    "    t2.update(delta)\n",
    "    \n",
    "# register the callback\n",
    "v1.on_update(cb, mode=\"row\")\n",
    "    \n",
    "# update t1, which updates t2 automatically\n",
    "t1.update(data)\n",
    "\n",
    "# t2 now has data after t1 is updated\n",
    "t2.view().to_df()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}