{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 10 minutes to Koalas\n",
"\n",
"This is a short introduction to Koalas, geared mainly for new users. This notebook shows you some key differences between pandas and Koalas. You can run this examples by yourself on a live notebook [here](https://mybinder.org/v2/gh/databricks/koalas/master?filepath=docs%2Fsource%2Fgetting_started%2F10min.ipynb). For Databricks Runtime, you can import and run [the current .ipynb file](https://raw.githubusercontent.com/databricks/koalas/master/docs/source/getting_started/10min.ipynb) out of the box. Try it on [Databricks Community Edition](https://community.cloud.databricks.com/) for free.\n",
"\n",
"Customarily, we import Koalas as follows:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import databricks.koalas as ks\n",
"from pyspark.sql import SparkSession"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Object Creation\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Creating a Koalas Series by passing a list of values, letting Koalas create a default integer index:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"s = ks.Series([1, 3, 5, np.nan, 6, 8])"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1.0\n",
"1 3.0\n",
"2 5.0\n",
"3 NaN\n",
"4 6.0\n",
"5 8.0\n",
"dtype: float64"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Creating a Koalas DataFrame by passing a dict of objects that can be converted to series-like."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"kdf = ks.DataFrame(\n",
" {'a': [1, 2, 3, 4, 5, 6],\n",
" 'b': [100, 200, 300, 400, 500, 600],\n",
" 'c': [\"one\", \"two\", \"three\", \"four\", \"five\", \"six\"]},\n",
" index=[10, 20, 30, 40, 50, 60])"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" a \n",
" b \n",
" c \n",
" \n",
" \n",
" \n",
" \n",
" 10 \n",
" 1 \n",
" 100 \n",
" one \n",
" \n",
" \n",
" 20 \n",
" 2 \n",
" 200 \n",
" two \n",
" \n",
" \n",
" 30 \n",
" 3 \n",
" 300 \n",
" three \n",
" \n",
" \n",
" 40 \n",
" 4 \n",
" 400 \n",
" four \n",
" \n",
" \n",
" 50 \n",
" 5 \n",
" 500 \n",
" five \n",
" \n",
" \n",
" 60 \n",
" 6 \n",
" 600 \n",
" six \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a b c\n",
"10 1 100 one\n",
"20 2 200 two\n",
"30 3 300 three\n",
"40 4 400 four\n",
"50 5 500 five\n",
"60 6 600 six"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Creating a pandas DataFrame by passing a numpy array, with a datetime index and labeled columns:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"dates = pd.date_range('20130101', periods=6)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',\n",
" '2013-01-05', '2013-01-06'],\n",
" dtype='datetime64[ns]', freq='D')"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dates"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"pdf = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" \n",
" \n",
" \n",
" \n",
" 2013-01-01 \n",
" -0.407291 \n",
" 0.066551 \n",
" -0.073149 \n",
" 0.648219 \n",
" \n",
" \n",
" 2013-01-02 \n",
" -0.848735 \n",
" 0.437277 \n",
" 0.632657 \n",
" 0.312861 \n",
" \n",
" \n",
" 2013-01-03 \n",
" -0.415537 \n",
" -1.787072 \n",
" 0.242221 \n",
" 0.125543 \n",
" \n",
" \n",
" 2013-01-04 \n",
" -1.637271 \n",
" 1.134810 \n",
" 0.282532 \n",
" 0.133995 \n",
" \n",
" \n",
" 2013-01-05 \n",
" -1.230477 \n",
" -1.925734 \n",
" 0.736288 \n",
" -0.547677 \n",
" \n",
" \n",
" 2013-01-06 \n",
" 1.092894 \n",
" -1.071281 \n",
" 0.318752 \n",
" -0.477591 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"2013-01-01 -0.407291 0.066551 -0.073149 0.648219\n",
"2013-01-02 -0.848735 0.437277 0.632657 0.312861\n",
"2013-01-03 -0.415537 -1.787072 0.242221 0.125543\n",
"2013-01-04 -1.637271 1.134810 0.282532 0.133995\n",
"2013-01-05 -1.230477 -1.925734 0.736288 -0.547677\n",
"2013-01-06 1.092894 -1.071281 0.318752 -0.477591"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pdf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, this pandas DataFrame can be converted to a Koalas DataFrame"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"kdf = ks.from_pandas(pdf)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"databricks.koalas.frame.DataFrame"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(kdf)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It looks and behaves the same as a pandas DataFrame though"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" \n",
" \n",
" \n",
" \n",
" 2013-01-01 \n",
" -0.407291 \n",
" 0.066551 \n",
" -0.073149 \n",
" 0.648219 \n",
" \n",
" \n",
" 2013-01-02 \n",
" -0.848735 \n",
" 0.437277 \n",
" 0.632657 \n",
" 0.312861 \n",
" \n",
" \n",
" 2013-01-03 \n",
" -0.415537 \n",
" -1.787072 \n",
" 0.242221 \n",
" 0.125543 \n",
" \n",
" \n",
" 2013-01-04 \n",
" -1.637271 \n",
" 1.134810 \n",
" 0.282532 \n",
" 0.133995 \n",
" \n",
" \n",
" 2013-01-05 \n",
" -1.230477 \n",
" -1.925734 \n",
" 0.736288 \n",
" -0.547677 \n",
" \n",
" \n",
" 2013-01-06 \n",
" 1.092894 \n",
" -1.071281 \n",
" 0.318752 \n",
" -0.477591 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"2013-01-01 -0.407291 0.066551 -0.073149 0.648219\n",
"2013-01-02 -0.848735 0.437277 0.632657 0.312861\n",
"2013-01-03 -0.415537 -1.787072 0.242221 0.125543\n",
"2013-01-04 -1.637271 1.134810 0.282532 0.133995\n",
"2013-01-05 -1.230477 -1.925734 0.736288 -0.547677\n",
"2013-01-06 1.092894 -1.071281 0.318752 -0.477591"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also, it is possible to create a Koalas DataFrame from Spark DataFrame. \n",
"\n",
"Creating a Spark DataFrame from pandas DataFrame"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"spark = SparkSession.builder.getOrCreate()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"sdf = spark.createDataFrame(pdf)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"+--------------------+-------------------+--------------------+-------------------+\n",
"| A| B| C| D|\n",
"+--------------------+-------------------+--------------------+-------------------+\n",
"|-0.40729126067930577|0.06655086061836445|-0.07314878758440578| 0.6482187447085683|\n",
"| -0.848735274668907|0.43727685786558224| 0.6326566086816865| 0.312860815784838|\n",
"|-0.41553692955141575|-1.7870717259038067| 0.24222142308402184| 0.125543462922973|\n",
"| -1.637270523583917| 1.1348099198020765| 0.2825324338895592|0.13399483028402598|\n",
"| -1.2304766522352943|-1.9257342346663335| 0.7362879432261002|-0.5476765308367703|\n",
"| 1.0928943198263723|-1.0712812856772376| 0.31875224896792975|-0.4775906715060247|\n",
"+--------------------+-------------------+--------------------+-------------------+\n",
"\n"
]
}
],
"source": [
"sdf.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Creating Koalas DataFrame from Spark DataFrame.\n",
"`to_koalas()` is automatically attached to Spark DataFrame and available as an API when Koalas is imported."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"kdf = sdf.to_koalas()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" -0.407291 \n",
" 0.066551 \n",
" -0.073149 \n",
" 0.648219 \n",
" \n",
" \n",
" 1 \n",
" -0.848735 \n",
" 0.437277 \n",
" 0.632657 \n",
" 0.312861 \n",
" \n",
" \n",
" 2 \n",
" -0.415537 \n",
" -1.787072 \n",
" 0.242221 \n",
" 0.125543 \n",
" \n",
" \n",
" 3 \n",
" -1.637271 \n",
" 1.134810 \n",
" 0.282532 \n",
" 0.133995 \n",
" \n",
" \n",
" 4 \n",
" -1.230477 \n",
" -1.925734 \n",
" 0.736288 \n",
" -0.547677 \n",
" \n",
" \n",
" 5 \n",
" 1.092894 \n",
" -1.071281 \n",
" 0.318752 \n",
" -0.477591 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"0 -0.407291 0.066551 -0.073149 0.648219\n",
"1 -0.848735 0.437277 0.632657 0.312861\n",
"2 -0.415537 -1.787072 0.242221 0.125543\n",
"3 -1.637271 1.134810 0.282532 0.133995\n",
"4 -1.230477 -1.925734 0.736288 -0.547677\n",
"5 1.092894 -1.071281 0.318752 -0.477591"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Having specific [dtypes](http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dtypes) . Types that are common to both Spark and pandas are currently supported."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"A float64\n",
"B float64\n",
"C float64\n",
"D float64\n",
"dtype: object"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Viewing Data\n",
"\n",
"See the [API Reference](https://koalas.readthedocs.io/en/latest/reference/index.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See the top rows of the frame. The results may not be the same as pandas though: unlike pandas, the data in a Spark dataframe is not _ordered_, it has no intrinsic notion of index. When asked for the head of a dataframe, Spark will just take the requested number of rows from a partition. Do not rely on it to return specific rows, use `.loc` or `iloc` instead."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" -0.407291 \n",
" 0.066551 \n",
" -0.073149 \n",
" 0.648219 \n",
" \n",
" \n",
" 1 \n",
" -0.848735 \n",
" 0.437277 \n",
" 0.632657 \n",
" 0.312861 \n",
" \n",
" \n",
" 2 \n",
" -0.415537 \n",
" -1.787072 \n",
" 0.242221 \n",
" 0.125543 \n",
" \n",
" \n",
" 3 \n",
" -1.637271 \n",
" 1.134810 \n",
" 0.282532 \n",
" 0.133995 \n",
" \n",
" \n",
" 4 \n",
" -1.230477 \n",
" -1.925734 \n",
" 0.736288 \n",
" -0.547677 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"0 -0.407291 0.066551 -0.073149 0.648219\n",
"1 -0.848735 0.437277 0.632657 0.312861\n",
"2 -0.415537 -1.787072 0.242221 0.125543\n",
"3 -1.637271 1.134810 0.282532 0.133995\n",
"4 -1.230477 -1.925734 0.736288 -0.547677"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Display the index, columns, and the underlying numpy data.\n",
"\n",
"You can also retrieve the index; the index column can be ascribed to a DataFrame, see later"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Int64Index([0, 1, 2, 3, 4, 5], dtype='int64')"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.index"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['A', 'B', 'C', 'D'], dtype='object')"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.columns"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[-0.40729126, 0.06655086, -0.07314879, 0.64821874],\n",
" [-0.84873527, 0.43727686, 0.63265661, 0.31286082],\n",
" [-0.41553693, -1.78707173, 0.24222142, 0.12554346],\n",
" [-1.63727052, 1.13480992, 0.28253243, 0.13399483],\n",
" [-1.23047665, -1.92573423, 0.73628794, -0.54767653],\n",
" [ 1.09289432, -1.07128129, 0.31875225, -0.47759067]])"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.to_numpy()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Describe shows a quick statistic summary of your data"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" \n",
" \n",
" \n",
" \n",
" count \n",
" 6.000000 \n",
" 6.000000 \n",
" 6.000000 \n",
" 6.000000 \n",
" \n",
" \n",
" mean \n",
" -0.574403 \n",
" -0.524242 \n",
" 0.356550 \n",
" 0.032558 \n",
" \n",
" \n",
" std \n",
" 0.945349 \n",
" 1.255721 \n",
" 0.291566 \n",
" 0.463350 \n",
" \n",
" \n",
" min \n",
" -1.637271 \n",
" -1.925734 \n",
" -0.073149 \n",
" -0.547677 \n",
" \n",
" \n",
" 25% \n",
" -1.230477 \n",
" -1.787072 \n",
" 0.242221 \n",
" -0.477591 \n",
" \n",
" \n",
" 50% \n",
" -0.848735 \n",
" -1.071281 \n",
" 0.282532 \n",
" 0.125543 \n",
" \n",
" \n",
" 75% \n",
" -0.407291 \n",
" 0.437277 \n",
" 0.632657 \n",
" 0.312861 \n",
" \n",
" \n",
" max \n",
" 1.092894 \n",
" 1.134810 \n",
" 0.736288 \n",
" 0.648219 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"count 6.000000 6.000000 6.000000 6.000000\n",
"mean -0.574403 -0.524242 0.356550 0.032558\n",
"std 0.945349 1.255721 0.291566 0.463350\n",
"min -1.637271 -1.925734 -0.073149 -0.547677\n",
"25% -1.230477 -1.787072 0.242221 -0.477591\n",
"50% -0.848735 -1.071281 0.282532 0.125543\n",
"75% -0.407291 0.437277 0.632657 0.312861\n",
"max 1.092894 1.134810 0.736288 0.648219"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Transposing your data"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" 0 \n",
" 1 \n",
" 2 \n",
" 3 \n",
" 4 \n",
" 5 \n",
" \n",
" \n",
" \n",
" \n",
" A \n",
" -0.407291 \n",
" -0.848735 \n",
" -0.415537 \n",
" -1.637271 \n",
" -1.230477 \n",
" 1.092894 \n",
" \n",
" \n",
" B \n",
" 0.066551 \n",
" 0.437277 \n",
" -1.787072 \n",
" 1.134810 \n",
" -1.925734 \n",
" -1.071281 \n",
" \n",
" \n",
" C \n",
" -0.073149 \n",
" 0.632657 \n",
" 0.242221 \n",
" 0.282532 \n",
" 0.736288 \n",
" 0.318752 \n",
" \n",
" \n",
" D \n",
" 0.648219 \n",
" 0.312861 \n",
" 0.125543 \n",
" 0.133995 \n",
" -0.547677 \n",
" -0.477591 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3 4 5\n",
"A -0.407291 -0.848735 -0.415537 -1.637271 -1.230477 1.092894\n",
"B 0.066551 0.437277 -1.787072 1.134810 -1.925734 -1.071281\n",
"C -0.073149 0.632657 0.242221 0.282532 0.736288 0.318752\n",
"D 0.648219 0.312861 0.125543 0.133995 -0.547677 -0.477591"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sorting by its index"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" \n",
" \n",
" \n",
" \n",
" 5 \n",
" 1.092894 \n",
" -1.071281 \n",
" 0.318752 \n",
" -0.477591 \n",
" \n",
" \n",
" 4 \n",
" -1.230477 \n",
" -1.925734 \n",
" 0.736288 \n",
" -0.547677 \n",
" \n",
" \n",
" 3 \n",
" -1.637271 \n",
" 1.134810 \n",
" 0.282532 \n",
" 0.133995 \n",
" \n",
" \n",
" 2 \n",
" -0.415537 \n",
" -1.787072 \n",
" 0.242221 \n",
" 0.125543 \n",
" \n",
" \n",
" 1 \n",
" -0.848735 \n",
" 0.437277 \n",
" 0.632657 \n",
" 0.312861 \n",
" \n",
" \n",
" 0 \n",
" -0.407291 \n",
" 0.066551 \n",
" -0.073149 \n",
" 0.648219 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"5 1.092894 -1.071281 0.318752 -0.477591\n",
"4 -1.230477 -1.925734 0.736288 -0.547677\n",
"3 -1.637271 1.134810 0.282532 0.133995\n",
"2 -0.415537 -1.787072 0.242221 0.125543\n",
"1 -0.848735 0.437277 0.632657 0.312861\n",
"0 -0.407291 0.066551 -0.073149 0.648219"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.sort_index(ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sorting by value"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" \n",
" \n",
" \n",
" \n",
" 4 \n",
" -1.230477 \n",
" -1.925734 \n",
" 0.736288 \n",
" -0.547677 \n",
" \n",
" \n",
" 2 \n",
" -0.415537 \n",
" -1.787072 \n",
" 0.242221 \n",
" 0.125543 \n",
" \n",
" \n",
" 5 \n",
" 1.092894 \n",
" -1.071281 \n",
" 0.318752 \n",
" -0.477591 \n",
" \n",
" \n",
" 0 \n",
" -0.407291 \n",
" 0.066551 \n",
" -0.073149 \n",
" 0.648219 \n",
" \n",
" \n",
" 1 \n",
" -0.848735 \n",
" 0.437277 \n",
" 0.632657 \n",
" 0.312861 \n",
" \n",
" \n",
" 3 \n",
" -1.637271 \n",
" 1.134810 \n",
" 0.282532 \n",
" 0.133995 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"4 -1.230477 -1.925734 0.736288 -0.547677\n",
"2 -0.415537 -1.787072 0.242221 0.125543\n",
"5 1.092894 -1.071281 0.318752 -0.477591\n",
"0 -0.407291 0.066551 -0.073149 0.648219\n",
"1 -0.848735 0.437277 0.632657 0.312861\n",
"3 -1.637271 1.134810 0.282532 0.133995"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.sort_values(by='B')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Missing Data\n",
"Koalas primarily uses the value `np.nan` to represent missing data. It is by default not included in computations. \n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"pdf1 = pdf.reindex(index=dates[0:4], columns=list(pdf.columns) + ['E'])"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"pdf1.loc[dates[0]:dates[1], 'E'] = 1"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"kdf1 = ks.from_pandas(pdf1)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" E \n",
" \n",
" \n",
" \n",
" \n",
" 2013-01-01 \n",
" -0.407291 \n",
" 0.066551 \n",
" -0.073149 \n",
" 0.648219 \n",
" 1.0 \n",
" \n",
" \n",
" 2013-01-02 \n",
" -0.848735 \n",
" 0.437277 \n",
" 0.632657 \n",
" 0.312861 \n",
" 1.0 \n",
" \n",
" \n",
" 2013-01-03 \n",
" -0.415537 \n",
" -1.787072 \n",
" 0.242221 \n",
" 0.125543 \n",
" NaN \n",
" \n",
" \n",
" 2013-01-04 \n",
" -1.637271 \n",
" 1.134810 \n",
" 0.282532 \n",
" 0.133995 \n",
" NaN \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D E\n",
"2013-01-01 -0.407291 0.066551 -0.073149 0.648219 1.0\n",
"2013-01-02 -0.848735 0.437277 0.632657 0.312861 1.0\n",
"2013-01-03 -0.415537 -1.787072 0.242221 0.125543 NaN\n",
"2013-01-04 -1.637271 1.134810 0.282532 0.133995 NaN"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To drop any rows that have missing data."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" E \n",
" \n",
" \n",
" \n",
" \n",
" 2013-01-01 \n",
" -0.407291 \n",
" 0.066551 \n",
" -0.073149 \n",
" 0.648219 \n",
" 1.0 \n",
" \n",
" \n",
" 2013-01-02 \n",
" -0.848735 \n",
" 0.437277 \n",
" 0.632657 \n",
" 0.312861 \n",
" 1.0 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D E\n",
"2013-01-01 -0.407291 0.066551 -0.073149 0.648219 1.0\n",
"2013-01-02 -0.848735 0.437277 0.632657 0.312861 1.0"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf1.dropna(how='any')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Filling missing data."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" E \n",
" \n",
" \n",
" \n",
" \n",
" 2013-01-01 \n",
" -0.407291 \n",
" 0.066551 \n",
" -0.073149 \n",
" 0.648219 \n",
" 1.0 \n",
" \n",
" \n",
" 2013-01-02 \n",
" -0.848735 \n",
" 0.437277 \n",
" 0.632657 \n",
" 0.312861 \n",
" 1.0 \n",
" \n",
" \n",
" 2013-01-03 \n",
" -0.415537 \n",
" -1.787072 \n",
" 0.242221 \n",
" 0.125543 \n",
" 5.0 \n",
" \n",
" \n",
" 2013-01-04 \n",
" -1.637271 \n",
" 1.134810 \n",
" 0.282532 \n",
" 0.133995 \n",
" 5.0 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D E\n",
"2013-01-01 -0.407291 0.066551 -0.073149 0.648219 1.0\n",
"2013-01-02 -0.848735 0.437277 0.632657 0.312861 1.0\n",
"2013-01-03 -0.415537 -1.787072 0.242221 0.125543 5.0\n",
"2013-01-04 -1.637271 1.134810 0.282532 0.133995 5.0"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf1.fillna(value=5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Operations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Stats\n",
"Operations in general exclude missing data.\n",
"\n",
"Performing a descriptive statistic:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"A -0.574403\n",
"B -0.524242\n",
"C 0.356550\n",
"D 0.032558\n",
"dtype: float64"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Spark Configurations\n",
"\n",
"Various configurations in PySpark could be applied internally in Koalas.\n",
"For example, you can enable Arrow optimization to hugely speed up internal pandas conversion. See PySpark Usage Guide for Pandas with Apache Arrow ."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"prev = spark.conf.get(\"spark.sql.execution.arrow.enabled\") # Keep its default value.\n",
"ks.set_option(\"compute.default_index_type\", \"distributed\") # Use default index prevent overhead.\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\") # Ignore warnings coming from Arrow optimizations."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"493 ms ± 157 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"spark.conf.set(\"spark.sql.execution.arrow.enabled\", True)\n",
"%timeit ks.range(300000).to_pandas()"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.39 s ± 109 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"spark.conf.set(\"spark.sql.execution.arrow.enabled\", False)\n",
"%timeit ks.range(300000).to_pandas()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"ks.reset_option(\"compute.default_index_type\")\n",
"spark.conf.set(\"spark.sql.execution.arrow.enabled\", prev) # Set its default value back."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Grouping\n",
"By “group by” we are referring to a process involving one or more of the following steps:\n",
"\n",
"- Splitting the data into groups based on some criteria\n",
"- Applying a function to each group independently\n",
"- Combining the results into a data structure"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"kdf = ks.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',\n",
" 'foo', 'bar', 'foo', 'foo'],\n",
" 'B': ['one', 'one', 'two', 'three',\n",
" 'two', 'two', 'one', 'three'],\n",
" 'C': np.random.randn(8),\n",
" 'D': np.random.randn(8)})"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" foo \n",
" one \n",
" 1.028745 \n",
" -0.804571 \n",
" \n",
" \n",
" 1 \n",
" bar \n",
" one \n",
" 0.593379 \n",
" -1.592110 \n",
" \n",
" \n",
" 2 \n",
" foo \n",
" two \n",
" 0.051362 \n",
" 0.466273 \n",
" \n",
" \n",
" 3 \n",
" bar \n",
" three \n",
" 0.977622 \n",
" -0.822670 \n",
" \n",
" \n",
" 4 \n",
" foo \n",
" two \n",
" -1.105357 \n",
" -0.027466 \n",
" \n",
" \n",
" 5 \n",
" bar \n",
" two \n",
" -0.009076 \n",
" 0.977587 \n",
" \n",
" \n",
" 6 \n",
" foo \n",
" one \n",
" 0.643092 \n",
" 0.403405 \n",
" \n",
" \n",
" 7 \n",
" foo \n",
" three \n",
" -1.451129 \n",
" 0.230347 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"0 foo one 1.028745 -0.804571\n",
"1 bar one 0.593379 -1.592110\n",
"2 foo two 0.051362 0.466273\n",
"3 bar three 0.977622 -0.822670\n",
"4 foo two -1.105357 -0.027466\n",
"5 bar two -0.009076 0.977587\n",
"6 foo one 0.643092 0.403405\n",
"7 foo three -1.451129 0.230347"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Grouping and then applying the [sum()](https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.groupby.GroupBy.sum.html#databricks.koalas.groupby.GroupBy.sum) function to the resulting groups."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" C \n",
" D \n",
" \n",
" \n",
" A \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" bar \n",
" 1.561925 \n",
" -1.437193 \n",
" \n",
" \n",
" foo \n",
" -0.833286 \n",
" 0.267988 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" C D\n",
"A \n",
"bar 1.561925 -1.437193\n",
"foo -0.833286 0.267988"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.groupby('A').sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Grouping by multiple columns forms a hierarchical index, and again we can apply the sum function."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" C \n",
" D \n",
" \n",
" \n",
" A \n",
" B \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" bar \n",
" one \n",
" 0.593379 \n",
" -1.592110 \n",
" \n",
" \n",
" three \n",
" 0.977622 \n",
" -0.822670 \n",
" \n",
" \n",
" two \n",
" -0.009076 \n",
" 0.977587 \n",
" \n",
" \n",
" foo \n",
" one \n",
" 1.671837 \n",
" -0.401166 \n",
" \n",
" \n",
" three \n",
" -1.451129 \n",
" 0.230347 \n",
" \n",
" \n",
" two \n",
" -1.053995 \n",
" 0.438807 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" C D\n",
"A B \n",
"bar one 0.593379 -1.592110\n",
" three 0.977622 -0.822670\n",
" two -0.009076 0.977587\n",
"foo one 1.671837 -0.401166\n",
" three -1.451129 0.230347\n",
" two -1.053995 0.438807"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.groupby(['A', 'B']).sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting\n",
"See the Plotting docs."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"from matplotlib import pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"pser = pd.Series(np.random.randn(1000),\n",
" index=pd.date_range('1/1/2000', periods=1000))"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"kser = ks.Series(pser)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"kser = kser.cummax()"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEECAYAAAA4Qc+SAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAStElEQVR4nO3dfYwd1X3G8efxC2/hxQZvwQGbrRq3DaDykpVLRP8goSGUpqFVqEpIE6CVrEapAlKqKEojUPmjEq1KpQhSZEIUgxAKAkrAhbYuJQ2owdHaMSZgyksIAeLENi/GTozD7vz6x51dlvWs78vOvXfmzPcjXe3dmeOZc/fIz549M+eMI0IAgPpbMOwKAADKQaADQCIIdABIBIEOAIkg0AEgEQQ6ACRiUbsCtg+T9F1Jh+bl74qIa2aVuVzSP0p6Jd90Q0R8/WDHXbZsWYyOjvZQZQBork2bNu2KiJGifW0DXdJ+SR+OiL22F0t61PaDEfHYrHLfioi/7rRSo6OjGh8f77Q4AECS7Rfn2tc20KM182hv/u3i/MVsJAComI7G0G0vtL1F0g5JGyJiY0GxT9jeavsu2ytKrSUAoK2OAj0iJiPiDEknSVpt+7RZRe6XNBoRvyNpg6R1Rcexvcb2uO3xnTt3zqfeAIBZurrLJSLekPSwpAtmbX81Ivbn335d0gfm+PdrI2IsIsZGRgrH9AEAPWob6LZHbC/J3x8u6SOSnp5VZvmMbz8uaVuZlQQAtNfJXS7LJa2zvVCtXwB3RsR629dKGo+I+yR93vbHJU1Iek3S5f2qMACgmIe1fO7Y2Fhw2yIweFkW2rFnf/uCqKTlSw7fFBFjRfs66aEDSMg19z2p2x6b81Zm1BiBDjTMz958SyccfZiu/P1Vw64KenDpdXPvI9CBhomQjn3PIfrk6pXDrgp6cOlB9rE4F9AwESF72LVAPxDoQMOERKAnikAHGiYiZJHoKSLQgYYJSQvI8yQR6EDDZIy5JItABxqmNeSCFBHoQAMx5JImAh1omCxCZsglSQQ60DARYsglUQQ60DAR0gJ66Eki0IGGCdFFTxWBDjRMRp4ni0AHmia4DT1VBDrQMKFgDD1RBDrQMBk99GQR6EDDsDhXugh0oGFYyiVdBDrQMK0hFxI9RQQ60DQszpUsAh1oGIZc0kWgAw3D1P90EehAw2QMuSSLQAcaJrgPPVkEOtAwrTF0Ej1FBDrQMDyCLl0EOtAwDLmki0AHGibE1P9UEehAw0RIC/ifnySaFWiYjMW5kkWgAw0TEo8sShSBDjQNM0WTRaADDcNM0XQR6EDDsDhXugh0oGFYnCtdBDrQMAy5pKttoNs+zPb3bT9u+0nbf1dQ5lDb37L9nO2Ntkf7UVkA8xch7nJJVCc99P2SPhwRp0s6Q9IFts+eVeYvJb0eEe+T9M+Sriu3mgDKxH3oaWob6NGyN/92cf6KWcUukrQuf3+XpPPMcm5AJUWEFvC/M0kdjaHbXmh7i6QdkjZExMZZRU6U9JIkRcSEpN2SjiuzogDKkbE4V7I6CvSImIyIMySdJGm17dN6OZntNbbHbY/v3Lmzl0MAmCcW50pXV3e5RMQbkh6WdMGsXa9IWiFJthdJOkbSqwX/fm1EjEXE2MjISG81BjAvLM6Vrk7uchmxvSR/f7ikj0h6elax+yRdlr+/WNJ/R8TscXYAFZCxmEuyFnVQZrmkdbYXqvUL4M6IWG/7WknjEXGfpFsk3Wb7OUmvSbqkbzUGME/BGHqi2gZ6RGyVdGbB9qtnvH9L0p+WWzUA/RBB/zxVjKQBDRNi6n+qCHSgYbJgyCVVBDrQMAy5pItABxomIsRE7jQR6EDDBDNFk0WgAw3TWmyRRE8RgQ40DItzpYtABxqGxbnSRaADDRPiomiqCHSgYbhtMV0EOtAwIdFDTxSBDjRMMFM0WQQ60DAMuaSLQAcahsW50kWgAw3D4lzpItCBhmHIJV0EOtBEdNGTRKADDTL1qF+m/qeJQAcaJMsf3c7iXGnq5CHRGKJvb3lF//PMzmFXA4mIqUAnz5NEoFfc1x5+Xi++9gstO/LQYVcFiTj5uCN0+oolw64G+oBAr7iJLNN5v328bvzUWcOuCoCKYwy94iaz0EKuYAHoAIFecRNZaBGBDqADBHrFZVloAYEOoAMEesXRQwfQKQK94hhDB9ApAr3i6KED6BSBXnGtHjrNBKA9kqLiJrJMixbSQwfQHoFecZNZ8DACAB0h0CtukjF0AB0i0Cssy0JZiLtcAHSEQK+wyXxpPHroADpBoFfYZL549UIuigLoAIFeYRMZPXQAnSPQK2xyMu+hcx86gA6QFBU2NYbOiAuATrQNdNsrbD9s+ynbT9q+sqDMubZ3296Sv67uT3WbZSLLJEkLF/J7F0B7nTyxaELSFyJis+2jJG2yvSEinppV7pGI+Fj5VWyuScbQAXShbdcvIrZHxOb8/R5J2ySd2O+KQZqYHkMn0AG019UzRW2PSjpT0saC3R+0/bikn0r6m4h4ct61K7DvV5P63o92TYddynbs2S+JHjqAznQc6LaPlHS3pKsi4s1ZuzdLOjki9tq+UNK9klYVHGONpDWStHLlyp4qfMf3f6Jr188e7Unb0iMOGXYVANRAR4Fue7FaYX57RNwze//MgI+IB2x/zfayiNg1q9xaSWslaWxsrKcu9r63JyVJ937unEb0XA9bvEC/MXLksKsBoAbaBrptS7pF0raIuH6OMidI+nlEhO3Vao3Nv1pqTXNZfqHwtPcerUXc/QEA0zrpoZ8j6dOSnrC9Jd/2ZUkrJSkibpJ0saTP2p6QtE/SJRHRl0HuPM9llpQFgHdpG+gR8aikg6ZnRNwg6YayKnUwWf57ogGjLQDQldqNWUx1++mhA8C71S/QI0SWA8CBahfoWfBINgAoUrtAj2D8HACK1C7Qs5B88Gu0ANBItQt0xtABoFj9Al1iDB0ACtQu0LOMHjoAFKlfoAc9dAAoUrtAD9FDB4Ai9Qt0eugAUKh2gZ5xlwsAFKploNNDB4AD1S7QmSkKAMVqF+it9dBJdACYrXaBHhH00AGgQA0DnbtcAKBI7QKdu1wAoFgNA50eOgAUqV2gM1MUAIrVL9BDBDoAFKhdoDOxCACK1S7QucsFAIrVLtC5ywUAitUu0COYJwoARWoX6IyhA0Cx2gU6Y+gAUKx2gc4YOgAUq2GgSybRAeAAtQt0idUWAaBI7QI9Y6YoABSqYaBzlwsAFKldoAdj6ABQqHaBnkUwsQgACtQu0HlINAAUq1+gizF0AChSu0DPMmaKAkCR+gU6q3MBQKG2gW57he2HbT9l+0nbVxaUse2v2n7O9lbbZ/WnuoyhA8BcFnVQZkLSFyJis+2jJG2yvSEinppR5g8krcpfvyvpX/KvpWuNodfuDwsA6Lu2yRgR2yNic/5+j6Rtkk6cVewiSbdGy2OSltheXnptxUxRAJhLV11d26OSzpS0cdauEyW9NOP7l3Vg6JeCmaIAUKzjQLd9pKS7JV0VEW/2cjLba2yP2x7fuXNnL4dgpigAzKGjQLe9WK0wvz0i7iko8oqkFTO+Pynf9i4RsTYixiJibGRkpJf6KpgpCgCFOrnLxZJukbQtIq6fo9h9kj6T3+1ytqTdEbG9xHpOy7jLBQAKdXKXyzmSPi3pCdtb8m1flrRSkiLiJkkPSLpQ0nOSfinpivlU6n+f36U3971duG/3vrf1a0cdOp/DA0CS2gZ6RDyqNlN5IiIkfa6MCv141y906c2zr7m+29jo0jJOBQBJ6aSHPlC//NWkJOkrf/h+nfO+ZYVlfn3ZewZZJQCohcoFehYhSVpx7BF6//Kjh1wbAKiPyk655F5zAOhO5QJ9qodOnANAdyoX6Hmea0HlagYA1Va52Hynh04fHQC6UblAzzvoLMAFAF2qXqBP9dBJdADoSgUDvfWV6f0A0J3KBXqWBzpj6ADQncoF+tSQCz10AOhO5QI9m74qOtRqAEDtVC7QQ1M9dBIdALpRvUCfHkMHAHSjsoG+gEF0AOhK5QKdtVwAoDeVC/R3ZooS6QDQjcoF+nQPnTwHgK5ULtDFRVEA6EnlAj0LblsEgF5ULtCnb1skzwGgK5ULdHroANCbygV6tC8CAChQvUCnhw4APalgoLe+kucA0J3KBXo2/YALEh0AulG5QJ9abZE8B4DuVC7QMx5BBwA9qVygT10UZa4oAHSncoE+hR46AHSncoH+zuJcJDoAdKNygR6MoQNATyoX6Nn0aoskOgB0o3KBHqyHDgA9qWCgt74S6ADQneoFuljLBQB6UblAz+ihA0BPKhfowVouANCTtoFu+xu2d9j+4Rz7z7W92/aW/HX1fCo0fR/6fA4CAA20qIMy35R0g6RbD1LmkYj4WBkVmp74Tw8dALrStoceEd+V9NoA6jJ1PkmMoQNAt8oaQ/+g7cdtP2j71PkciDF0AOhNJ0Mu7WyWdHJE7LV9oaR7Ja0qKmh7jaQ1krRy5crCgzGGDgC9mXcPPSLejIi9+fsHJC22vWyOsmsjYiwixkZGRuY4Xl4xeugA0JV5B7rtE5xfwbS9Oj/mq70eL5ueKjrfmgFAs7QdcrF9h6RzJS2z/bKkayQtlqSIuEnSxZI+a3tC0j5Jl8Q7T6noGastAkB32gZ6RHyyzf4b1LqtsRSshw4AvanwTNHh1gMA6qZygc566ADQm8oF+tRqi4y4AEB3qhforLYIAD2pYKCzHjoA9KJygZ5xGzoA9KRygc5MUQDoTeUCPWO1RQDoSeUCnfXQAaA3Zay22JNde/frG4++cMD2H/zkdXrnANCDoQX69t1v6dr1TxXuO2np4QOuDQDU39AC/ZTlR+s7V59fuO/wQxYOuDYAUH9DC/SFC6xjjlg8rNMDQHIqd1EUANAbAh0AEkGgA0AiCHQASASBDgCJINABIBEEOgAkwlPrjw/8xPYeSf/XYfFjJO0uoUy3ZYdVbpjn7sdnWSZp1xDOTfsN9pidtnOnx0zpZ1PmuX8rIo4q3BMRQ3lJGu+i7NoyynRbdljl6lDHLj9LR21d9c+SUvv16dxD+T9dk59Naec+2M+5LkMu95dUptuywyo3zHP347N0quqfJaX269cxyzx3Sj+bfpz7AMMcchmPiLGhnBwDRVs3A+08GAf7OQ+zh752iOfGYNHWzUA7D8acP+eh9dABAOWqyxh6Ldne22b/d2zzJ2rN0c7NUId2JtABIBF9D/R2v9VSZ/tc2+tnfH+D7cuHWKW+aXJb087NUPV2pocOAIkYSKDbPtL2Q7Y3237C9kX59lHb22zfbPtJ2/9pmweK1hht3Qy0czUNqof+lqQ/iYizJH1I0j/Zdr5vlaQbI+JUSW9I+sSA6jQoE3r3z/mwYVVkQJra1rQz7Tx0gwp0S/p721sl/ZekEyUdn+97ISK25O83SRodUJ0G5UVJp9g+1PYSSecNu0J91tS2pp1p56Eb1EOiPyVpRNIHIuJt2z/WO7/Z9s8oNykpiT/PbC+StD8iXrJ9p6QfSnpB0g+GW7O+a1Rb086083Br9m6DCvRjJO3IG/5Dkk4e0HmH6VRJz0tSRHxR0hdnF4iIcwdcp0FoWlvTzrSz8u3nDrhOB+hroE/9VpN0u6T7bT8haVzS0/0877DZ/itJn5d01bDrMihNbGvamXaumr5O/bd9uqSbI2J1306CSqCtm4F2rra+XRTNf6vdIekr/ToHqoG2bgbaufpYnAsAElFaD932CtsP234qn1BwZb79WNsbbD+bf12ab7ftr9p+zvZW22fNONZleflnbV9WVh1RjpLb+t9tvzFzOjWqoax2tn2G7e/lx9hq+8+G+bmS1uljkTp4bNJySWfl74+S9IykUyT9g6Qv5du/JOm6/P2Fkh5U637WsyVtzLcfK+lH+del+fulZdWTV3XaOt93nqQ/krR+2J+LV3/aWdJvSlqVv3+vpO2Slgz786X4Kq2HHhHbI2Jz/n6PpG1qTTa4SNK6vNg6SX+cv79I0q3R8pikJbaXS/qopA0R8VpEvC5pg6QLyqon5q/EtlZEPCRpzyDrj86U1c4R8UxEPJsf56eSdqh1DztK1peLorZHJZ0paaOk4yNie77rZ3pnNtmJkl6a8c9ezrfNtR0VNM+2Rk2U1c62V0s6RPk93ShX6YFu+0hJd0u6KiLenLkvWn9zcRU2EbR1M5TVzvlfZbdJuiIistIrinID3fZitRr+9oi4J9/886k/r/OvO/Ltr0haMeOfn5Rvm2s7KqSktkbFldXOto+W9G+S/jYfjkEflHmXiyXdImlbRFw/Y9d9kqbuVLlM0rdnbP9MfmX8bEm78z/j/kPS+baX5lfPz8+3oSJKbGtUWFntbPsQSf+q1vj6XQOqfjOVdXVV0u+p9afXVklb8teFko6T9JCkZ9Vale3YvLwl3ajWWNoTksZmHOsvJD2Xv64Y9pVjXn1t60ck7ZS0T60x148O+/PxKredJf25pLdnHGOLpDOG/flSfDGxCAASwSPoACARBDoAJIJAB4BEEOgAkAgCHQASQaADQCIIdABIBIEOAIn4f3X9wryKje5GAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"kser.plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On a DataFrame, the plot() method is a convenience to plot all of the columns with labels:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"pdf = pd.DataFrame(np.random.randn(1000, 4), index=pser.index,\n",
" columns=['A', 'B', 'C', 'D'])"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"kdf = ks.from_pandas(pdf)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"kdf = kdf.cummax()"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAEECAYAAAABJn7JAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAcC0lEQVR4nO3de5SU9Z3n8fe3Lk1zae4XQRAQvASSgEBMzHB21KyGsOsaxk10Ml6SzaxxN8bJMWczmt2dOZ7MScbEzG5yzDm75BjH5LgYJWNUEjUXTdQxGkERUBBvKI1AN/cG+lZV3/2jqptG7bp0P089T1V9XufU6e6qp57n2/3TL9/6Pr/n95i7IyIi8ZWIOgARESlOiVpEJOaUqEVEYk6JWkQk5pSoRURiTolaRCTmUmHsdPLkyT5nzpwwdi0iUpc2bNiwz92nvN9roSTqOXPmsH79+jB2LSJSl8zsrcFeU+tDRCTmlKhFRGJOiVpEJOaUqEVEYk6JWkQk5pSoRURiLpTpeSIiUct1dZE9fDjqMAKhRC0idenNT6+iZ8eOqMMIhBK1iNSl3t27Gb18OS0XXxR1KOW5/PJBX1KiFpG65JkMzQsXMuGzn406lPIUSdQ6mSgidcdzOchmsXQ66lACoUQtInXHMxkALFUfTQMlahGpO97TC6CKWkQktjJ9iVoVtYhILHmvKmoRkVjr61GjHrWISDypohYRibkTsz6UqEVEYkkVtYhIzHlvoaJWohYRiSfv7QE0PU9EJL4y9VVR18c/NyI1ILNvH4fuu6//Y7mEp3dXK1A/l5DXx28hUgMOP7SO9u//IOowGkZizBjSM2ZEHUYglKhFqiR7+BAkEpz90hbMLOpwpIaoRy1SJbmOoyTGjFGSloopUYtUSe5oB8kxY6IOQ2qQErVIlWQ7jpJoaYk6DKlBJXvUZtYMPAGMKGy/1t3/PuzARGpB7tgxdl73X8geOlRy255du2he8IEqRCX1ppyTid3Ahe5+1MzSwFNm9rC7PxNybCKx19O6i+PPPUfzog+Tnjqt6LZNc+YwduWnqhSZ1JOSidrdHTha+DFdeHiYQYnUCi8sUD/5S1+i5cILI45G6lVZPWozS5rZRqAN+I27PxtuWCI1om/xnzq5sELiqaxE7e5Zd18MzATONbMPvnsbM7vWzNab2fr29vag4xSJJa+zS5Ulniqa9eHuh4DHgRXv89pqd1/m7sumTJkSVHwiseaqqKUKypn1MQXodfdDZjYSuAi4NfTIRN7l6df2cc2df6I3G59TJEv3buMfgL/40XNs+3lb1OFInSqnDJgO3GVmSfIV+L3uvi7csCrn7vy3tZt4vf1o6Y2lJu053EVTMsF/Pf/0qEPpN2XzAfgjfOajczhy2vyow5EadmOR8recWR+bgHMCjCcUmZyzdkMrsyeN4rSJo6IOR0Iwf+oYzj9rKl9cPjfqUPodye1gF/BXy+fTfNaZUYcjNezGIq/VTWMtm8t/HP7ssll8+QJVNlFyd37y8k842HUw8H13AP97Q+C7HbJTtr/GIuCn2+/m2NEJUYcjdapuEnXO84k6mdCCN1Hb2bGT29bfRtKSJKy+VylY/naGRcDaNx+g/VB9/64SnbpJ1H0VdVIrk0WuO9sNwHf+zXe4eM7FEUcTroP33ceeh/6OdZ95mPT06VGHIzXMrh48d8UvUWe6YcdTkKvsLhiJ7gwXJF5g9oF9sP3VkIKTcvQezd9dI71nC/TEZ4ZGKHZtAsDe/lfo0IJLEo74JerN98EDX674baOBO5uAjYWHRKZ3RBPMOIX0H26Fzq6owwmVvzIaGIf94q9hRJ3/oySRiV+i7u7If736ARhRfoVy4HgPn//xc1x3/jxWfvCUkIKTcmQOvgIbvk1q5W0wcUHU4YTK7/0lvLAG++IvYWRz1OFILbtl2aAvxS9R57L5rzPOgeZxZb+t53AXm/wAh8Z/CE49LaTggtPz9tt0vbw16jBC4Yf289FtOZqbj3Jk3L6owwlV165CYXHaR6CpKdpgpG7FL1F7IVFbsqK3ZftnfQQdUGVyx46R6yr9cb/1+q/QvX17FSKqvtHA1wDu/wG7Io6lGhJjx+oScglV/P7r8lz+a4XTunKFWR+JCGd99La18fon/m3/+g+lTPrPf83YSy4JOarqe273c3z7uW9z25/fxunj4nMVYVhSkydjCU3Nk/DEL1H3tT4SFVbUuejnUWf27sV7e5nwub+kaX7xi24slWLsyn9HcszoKkVXPZ1NO9i5w0jPn0fzhDOiDkek5sUvUfdX1ENtfUSXqHPHjgPQsmIFo889N7I4otaby3+iSCe09KdIEOL3ea2GK+rc8WMAJEbVX5Vcib5EnUrErw4QqUXxS9SeBQwq7DXH4crEvoo6MaqxF4XKFC5WUkUtEowYJupcxdU0nEjUiUgr6kKiHt3Yibq/9ZFUohYJQiSfTXNdXeQ6O9//xY7j0JOEg5WtvJY7dISWnmOkjx4hc3BEAFFWLrM/P2c4MbrBWx9ZtT5EglT1/5OyR47w6vkX4IXq8/1NgrUfr2i/aeBegF9BlCt95BLG/9l+JzTwdK0X2l4A1PoQCUr1E/Xhw/jx44y95BJGfvjD791g60Ow81m4+B8q2u/Og8e448kdXHnebOZPGTPoduveWMfmfZsrDbtseybAxs2rQ9t/rZg9djYjktF8shGpN9X/bJrLT78bs/zPGHfppe99/eHNsPGPcNWVFe321Tf28+A7z3D5f/goE+dPHnS7LU9sYuv+o6xbFbu7iYmIvK/qfz4vzHcedFZHLjuktkHfPOpSVyZmc1mSFc7RFhGJUtUTtef6EvUgh/ZcxZePQ3+hXnIedSaXITmEWSUiIlGJ4IxXIVEPllA9W/FViVD+lYlZz5IyzUYQkdpR/URdKH2taOuj8kSdK/PKxIxnNG1MRGpKdD3qwfrQnhtaRV3mlYmZXEY9ahGpKdH1qAn2ZGKm/8rE4ttlc1n1qEWkptRNjzpXSY9arQ8RqSGRzaMedHpehbM+Xm8/ytfXbqKtI39XlXJaH6NSjb0Wh4jUlsh61IPeEaPCk4kv7jzEhrcOMmvCKC5bMpPZk4qvs6HpeSJSa6peUZ+YRx1M6yOTze/vu59ZxKnjR5bcXtPzRKTWxPDKxMqWOe07iZgqc3lTVdQiUmsiSNSletTZinrUmULPu9xErYpaRGpN/HrUFd44oK/1kSpzSl8mpwteRKS2RDCPukRFnRtaRZ1MqvUhIvUpwh71YBV1hScTK+xRZ12r54lIbYkwUQ/yeoXT87LZChN1The8iEhtieyCF3vsm7DlfW7Hte81mHFO2bvrLXMxpj7qUYtIran+POq+ivqdDTD5LBh36skbtEyHhX9R9v6yuRyphLHujXX87JWfldz+WOaYWh8iUlMiqKj9xPcf/wosunxYu8tknWTC+PVbv+bVg6+yaMqiotufN/08LjztwmEdU0SkmiLoARSm5xlDWnf63TI5J51M0JvtZd74eay+WDeWFZH6EtmNA4Ah3XLr3TLZHMmE0ZPrIZ1ID3t/IiJxUzJTmtksM3vczF42s5fM7G+Gc0AfOOsjgJN6+Yra6M320pRsGvb+RETippySNgN8zd0XAB8DvmxmC4Z8xP5FmTyY1kehR62KWkTqVclE7e673f35wvcdwFbg1OLvKrpHoDCNOoDZF5mck0ok6Mn2qKIWkbpUUZPYzOYA5wDPvs9r15rZejNb397ePvhO+i8hJ6CTiTlSSSOTy9CUUKIWkfpTdqI2szHAz4GvuvuRd7/u7qvdfZm7L5syZcqg++nvUUNgsz6SCaMn20M6qdaHiNSfss7mmVmafJK+293/ZVhHzA04mVhG6+PBF99h96HOQV9/be9R0omEetQiUrdKJmozM+AOYKu7/9PwD1l+RX3oeA83rHmh5B4/uXAaL+V6lahFpC6VU1H/GXAVsNnMNhae+4a7/2pIR+xb68O8ZEXd0ZUB4Juf/iCXLRn8/GVzKsnH1uhkoojUp5KJ2t2fYvC17irmAy94KTGPurM3C8CEUWlGNRXfVvOoRaReRbDMaeGrASXuynKsO19Rj2oqXnnfvfVuMp5R60NE6lJ090yEkq2Pzp58RV2qmn5q11MAXDT7ouHFJiISQ9HdM7GMS8iP9Sfq4gm9K9PF0mlLOWPCGYGEKCISJ9HdMxFKzvo43lNe66Mr08XI1MhhxyYiEkdVW+b096+0cdPPN7Ps9c18GcCcy/7vs7xluwZ9T3dvea2Pzkwn01PTA4xWRCQ+qpaoN7ceZs+RLj58akv/cx8/YypnjZhW9H1TW0YwfVxz0W26sl00J4tvIyJSq6qWqPsuSPzMkpnsuTffo/7aJxfAhDnD3ndnplOtDxGpW1XrUWf7TiIOXOsjoHsXdmW6aE6pohaR+lS9ijrnnHSj8DJXzzvUdYi1r66lN9c76DadmU4lahGpW1VL1FnPr3J38q24SifqR3Y8wvef/37RbRKW4IzxmponIvWpij1qJ2HGyTe3LX34fZ37MIwNV20gWSSxJwK4/6KISBxVufVhA+ZRe8lLyAEOdh1k/IjxujxcRBpWqIna3fl/T/xPDnXuZ097BwsmdfL7LRnOBv553Fh6XroTUiOK7uP5tueZ2DwxzDBFRGIt1ET9VuvT/OOOB/I/GDABntiR42zgp+NbOPjSnWXtZ9X8VaHFKCISd6Em6u7uDgD+16mfYvOBT/C7rW383YcOs+eRH/LYpQ+QmjUvzMOLiNSFUBN1NpdfqyM5ciL70jPYnUjgIwsnBEeOC/PQIiJ1I9SpEplsDwCpRCo/6yNhJy5RLONEooiIhJ2oc/lEnUykyOacpFn/Mqf5+XkiIlJKqIm6r/WRTqTJOfkrE73vnolK1CIi5Qg1UfcWEnUqkc7Po04MqKjV+hARKUvIrY/8+hzJRLL/EnLPqfUhIlKJqrQ+UoXWx8k9alXUIiLlCHnWR19FnW992Ek96jCPLCJSP0KdR93X+kgn0mRzzqf/dD/7dzybf1E9ahGRsoSbqP1E6yPrzsK3t2CjRzH5yitJjNQdWUREyhHyycTClYnJNO7OiEw3oz/yEaZ85fowDysiUleqkqhThdZHOtODqZIWEalIdRJ1Mk3WoSnTQ6JZiVpEpBLhTs/zE4k6l82R7u0hMUqJWkSkEiFPzzvR+kj09pDAMVXUIiIVCS1R5zxHb6GiTiaaSPR05w+oHrWISEVCmZ6X9SzL71lOR08HF2/I0vb72/iPuzoBSIxsDuOQIiJ1K5SKujfbS0dPB58av5Crn8vQuXkbzZ1HaZ0xn5HnnBPGIUVE6lYoFbXjGMbKCQsZk13PiHOXcPvCa2hpTnHRPN1+S0SkEqFU1E5+4aUUgBuWTJArrJ4nIiKVCedkYmGBvBSG5+CNg93sPdKVXz1PREQqEnpF7Q7b2o6x90g3cyaPDuNwIiJ1LbQedf/O3Thl4hi2fXMFI1JaMU9EpFLhJGofUFHnIJVO05xOhnEoEZG6V7LENbMfm1mbmW2pdOcpN3BIpENdTVVEpK6V04v4Z2BFJTvta31YDtyNVEqJWkRkqEomand/AjhQyU77Wh+ezYJDMp0eWnQiIhLcrA8zu9bM1pvZ+rbDRwF44c0D+Yq6SYlaRGSoAkvU7r7a3Ze5+7LuXH6+9NZdh3GHcS1aiElEZKhCmS83Y3x+4aUb/nwe5GDaeM2fFhEZqnAmNvccAyC1exNgmHrUIiJDVs70vDXAH4GzzKzVzL5Y6j3eeRCA5Cu/zu8jqVkfIiJDVc6sj7909+nunnb3me5+R8n3jJkGQPJL/5p/IqWKWkRkqEJpfRzsOQJAqmUWAJbUpeMiIkMVSgbtyfWQtCSJXGEZvaQuHxcRGarQSt2f/fufQS4HgCWUqEVEhiq0RD2rZRZks/kfUkrUIiJDFUqiNoxR6VF4VhW1iMhwhXuWL5sBwFRRi4gMWaiJ2gs9alRRi4gMWShXoiQcjj3zDL3v7AZUUYuIDEcoiXrscXj781/o/znR0hLGYUREGkI4FXUOSKWY/ZO7sKYRNC/4QBiHERFpCKEkagMsnWbUkiVh7F5EpKGEMz3PwXT7LRGRQIQ0jxotbSoiEpBwpuepohYRCUyoPeqBent7aW1tpaurK4xDBqa5uZmZM2eS1icCEYmJcBL1+1TUra2ttLS0MGfOHMwsjMMOm7uzf/9+WltbmTt3btThiIgAIfaoSZ+cqLu6upg0aVJskzSAmTFp0qTYV/0i0ljCm/WRbnrv8zFO0n1qIUYRaSyhrfWhk4kiIsEIsaKO58m4X/ziF5gZ27ZtizoUEZGyhDePOqYV9Zo1a1i+fDlr1qyJOhQRkbI01Dzqo0eP8tRTT3HHHXdwzz33RB2OiEhZqjaPeqBbHnqJl985EugxF8wYy99fsrDoNg888AArVqzgzDPPZNKkSWzYsIGlS5cGGoeISNBC61G/e3peHKxZs4YrrrgCgCuuuELtDxGpCaFl02IVdanKNwwHDhzgscceY/PmzZgZ2WwWM+O73/2upuSJSKyFuHpevGZ9rF27lquuuoq33nqLHTt2sHPnTubOncuTTz4ZdWgiIkWFlKg9dtPz1qxZw6pVq0567rLLLlP7Q0RiL7zWR8xmfTz++OPvee6GG26IIBIRkcroxgEiIjEX3iXkMWt9iIjUqhAvIVdFLSIShBCXOVVFLSIShIa6hFxEpBaFuMypKmoRkSA01MnEZDLJ4sWLWbRoEUuWLOHpp5+OOiQRkZIaZh41wMiRI9m4cSMAjz76KDfffDN/+MMfIo5KRKS48CrqpvhV1AMdOXKECRMmRB2GiEhJ0VTUD98EezYHe8BTPgSf+seim3R2drJ48WK6urrYvXs3jz32WLAxiIiEoKyK2sxWmNkrZvaamd1U1p5j3PrYtm0bjzzyCFdffTXuHnVYIiJFlcymZpYEfghcBLQCz5nZg+7+ctH3FTuZWKLyrYbzzjuPffv20d7eztSpU6MOR0RkUOVU1OcCr7n7G+7eA9wDXFrsDZl0guazzgoivtBs27aNbDbLpEmTog5FRKSocvoTpwI7B/zcCny02Bv2Tx/FiPnzhxNXKPp61ADuzl133UUymYw4KhGR4gJrJJvZtcC1AONmjwtqt4HKZrNRhyAiUrFyWh+7gFkDfp5ZeO4k7r7a3Ze5+7KmEU1BxSci0vDKSdTPAWeY2VwzawKuAB4MNywREelTsvXh7hkzux54FEgCP3b3l4q9x9DNYkVEglJWj9rdfwX8KuRYRETkfYR2CbmIiAQjpBsHqPUhIhKUhqqo9+zZwxVXXMG8efNYunQpK1euZPv27VGHJSJSVCgLcpjFr6J2d1atWsU111zDPffcA8CLL77I3r17OfPMMyOOTkRkcPFbOSkkjz/+OOl0muuuu67/uUWLFkUYkYhIecKpqEv0qG/9061sO7At0GOePfFs/vbcvx309S1btrB06dJAjykiUg3hnEyMYetDRKRWRdL6KFb5hmXhwoWsXbu26scVERmuhpmed+GFF9Ld3c3q1av7n9u0aRNPPvlkhFGJiJTWMNPzzIz777+f3/72t8ybN4+FCxdy8803c8opp0QdmohIUQ0zPQ9gxowZ3HvvvVGHISJSkYZpfYiI1KqGaX2IiNQqTc8TEYk5tT5ERGJOrQ8RkZhTRS0iEnMN1aNOJpMsXryYhQsXsmjRIr73ve+Ry+WiDktEpKiGWT0PYOTIkWzcuBGAtrY2Pve5z3HkyBFuueWWiCMTERlcw7Y+pk6dyurVq7n99ttx96jDEREZVCQV9Z5vfYvurcEuczriA2dzyje+UdF7Tj/9dLLZLG1tbUybNi3QeEREgtJQPWoRkVoUyY0DKq18w/LGG2+QTCaZOnVq1KGIiAyqYedRt7e3c91113H99dfrE4CIxFpDrZ7X2dnJ4sWL6e3tJZVKcdVVV3HjjTdGHZaISFENNT0vm81GHYKISMVCaX20NLWEsVsRkYbUsPOoRURqRVVPJtbChSW1EKOINJaqJerm5mb2798f60To7uzfv5/m5uaoQxER6Ve1k4kzZ86ktbWV9vb2ah1ySJqbm5k5c2bUYYiI9Ktaok6n08ydO7dahxMRqRsNe8GLiEitUKIWEYk5JWoRkZizMGZhmFkH8EoZm44DDsd4uyiPXQu/y2RgX4DHrqe/TS0cu5IYyx3rIMe5km3rYfzOcvf3v1rQ3QN/AOvL3G51nLerhRgj/l3KGudy91lnf5vYH7vCGAP7f7oO/zaB7LPY3zjq1sdDMd8uymPXwu9SiXL2WU9/m1o4dtzHuZJt62n83iOs1sd6d18W+I4lVjTOjUNjHb5if+OwKurVIe1X4kXj3Dg01uEb9G8cSkUtIiLBibpHXZPM7GiJ139vZvqYWOM0zo0j7mOtRC0iEnNDTtSl/gWqd2Z2vpmtG/Dz7Wb2+QhDCk0jj7XGuXHEeaxVUYuIxNywErWZjTGz35nZ82a22cwuLTw/x8y2mtmPzOwlM/u1mY0MJmSJgsa6MWic42m4FXUXsMrdlwAXAN+zE7cgPwP4obsvBA4Blw3zWHGT4eS/X73fbaBRx1rj3BjjDDEe6+EmagO+ZWabgN8CpwLTCq+96e4bC99vAOYM81hx8xawwMxGmNl44BNRBxSyRh1rjXNjjDPEeKyHe+OAvwKmAEvdvdfMdnDiX6HuAdtlgbr4mGRmKaDb3Xea2b3AFuBN4IVoIwtdQ421xrkxxhlqY6yHm6jHAW2FAb0AmB1ATHG3EHgdwN2/Dnz93Ru4+/lVjqkaGm2sNc6NMc5QA2M9pETd9y8QcDfwkJltBtYD2wKMLXbM7DrgBuCrUcdSLY041hrnxhhnqJ2xHtIl5Ga2CPiRu58bfEgSJxrrxqBxjreKTyYW/gVaA/yP4MORONFYNwaNc/xpUSYRkZgrq6I2s1lm9riZvVyY7P43hecnmtlvzOzVwtcJhefNzH5gZq+Z2SYzWzJgX9cUtn/VzK4J59eSoQh4nB8xs0MDL8mV+AhqrM1ssZn9sbCPTWZ2eZS/V90q8xYy04Elhe9bgO3AAuA7wE2F528Cbi18vxJ4mPyczI8Bzxaenwi8Ufg6ofD9hHJveaNHuI+gxrnw2ieAS4B1Uf9eeoQ31sCZwBmF72cAu4HxUf9+9fYoq6J2993u/nzh+w5gK/mJ8JcCdxU2uwv4dOH7S4GfeN4zwHgzmw58EviNux9w94PAb4AV5cQg4QtwnHH33wEd1YxfyhfUWLv7dnd/tbCfd4A28vOwJUBDOZk4BzgHeBaY5u67Cy/t4cQVTKcCOwe8rbXw3GDPS8wMc5ylhgQ11mZ2LtBEYU6yBKeiRG1mY4CfA1919yMDX/P8Zx+dmawDGufGEdRYFz5J/RT4grvnAg+0wZWdqM0sTX5A73b3fyk8vbfvo27ha1vh+V3ArAFvn1l4brDnJSYCGmepAUGNtZmNBX4J/PdCW0QCVu6sDwPuALa6+z8NeOlBoG/mxjXAAwOev7pwpvhjwOHCx6lHgYvNbELhbPLFheckBgIcZ4m5oMbazJqA+8n3r9dWKfzGU84ZR2A5+Y9Am4CNhcdKYBLwO+BV8ittTSxsb8APyfeqNgPLBuzrPwGvFR5fiPpsqh6hjfOTQDvQSb6f+cmofz89gh9r4Eqgd8A+NgKLo/796u2hC15ERGJOt+ISEYk5JWoRkZhTohYRiTklahGRmFOiFhGJOSVqEZGYU6IWEYk5JWoRkZj7/+7KeQitM/04AAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"kdf.plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting data in/out\n",
"See the Input/Output\n",
" docs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### CSV\n",
"\n",
"CSV is straightforward and easy to use. See here to write a CSV file and here to read a CSV file."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 0.976091 \n",
" 0.910572 \n",
" -0.640756 \n",
" 0.034655 \n",
" \n",
" \n",
" 1 \n",
" 0.976091 \n",
" 0.910572 \n",
" -0.150827 \n",
" 0.034655 \n",
" \n",
" \n",
" 2 \n",
" 0.976091 \n",
" 0.910572 \n",
" 0.796879 \n",
" 0.034655 \n",
" \n",
" \n",
" 3 \n",
" 0.976091 \n",
" 0.910572 \n",
" 0.849741 \n",
" 0.034655 \n",
" \n",
" \n",
" 4 \n",
" 0.976091 \n",
" 0.910572 \n",
" 0.849741 \n",
" 0.370709 \n",
" \n",
" \n",
" 5 \n",
" 0.976091 \n",
" 0.910572 \n",
" 0.849741 \n",
" 0.698402 \n",
" \n",
" \n",
" 6 \n",
" 0.976091 \n",
" 0.910572 \n",
" 1.217456 \n",
" 0.698402 \n",
" \n",
" \n",
" 7 \n",
" 0.976091 \n",
" 0.910572 \n",
" 1.217456 \n",
" 0.698402 \n",
" \n",
" \n",
" 8 \n",
" 0.976091 \n",
" 0.910572 \n",
" 1.217456 \n",
" 0.698402 \n",
" \n",
" \n",
" 9 \n",
" 0.976091 \n",
" 0.910572 \n",
" 1.217456 \n",
" 0.698402 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"0 0.976091 0.910572 -0.640756 0.034655\n",
"1 0.976091 0.910572 -0.150827 0.034655\n",
"2 0.976091 0.910572 0.796879 0.034655\n",
"3 0.976091 0.910572 0.849741 0.034655\n",
"4 0.976091 0.910572 0.849741 0.370709\n",
"5 0.976091 0.910572 0.849741 0.698402\n",
"6 0.976091 0.910572 1.217456 0.698402\n",
"7 0.976091 0.910572 1.217456 0.698402\n",
"8 0.976091 0.910572 1.217456 0.698402\n",
"9 0.976091 0.910572 1.217456 0.698402"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.to_csv('foo.csv')\n",
"ks.read_csv('foo.csv').head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parquet\n",
"\n",
"Parquet is an efficient and compact file format to read and write faster. See here to write a Parquet file and here to read a Parquet file."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 0.976091 \n",
" 0.910572 \n",
" -0.640756 \n",
" 0.034655 \n",
" \n",
" \n",
" 1 \n",
" 0.976091 \n",
" 0.910572 \n",
" -0.150827 \n",
" 0.034655 \n",
" \n",
" \n",
" 2 \n",
" 0.976091 \n",
" 0.910572 \n",
" 0.796879 \n",
" 0.034655 \n",
" \n",
" \n",
" 3 \n",
" 0.976091 \n",
" 0.910572 \n",
" 0.849741 \n",
" 0.034655 \n",
" \n",
" \n",
" 4 \n",
" 0.976091 \n",
" 0.910572 \n",
" 0.849741 \n",
" 0.370709 \n",
" \n",
" \n",
" 5 \n",
" 0.976091 \n",
" 0.910572 \n",
" 0.849741 \n",
" 0.698402 \n",
" \n",
" \n",
" 6 \n",
" 0.976091 \n",
" 0.910572 \n",
" 1.217456 \n",
" 0.698402 \n",
" \n",
" \n",
" 7 \n",
" 0.976091 \n",
" 0.910572 \n",
" 1.217456 \n",
" 0.698402 \n",
" \n",
" \n",
" 8 \n",
" 0.976091 \n",
" 0.910572 \n",
" 1.217456 \n",
" 0.698402 \n",
" \n",
" \n",
" 9 \n",
" 0.976091 \n",
" 0.910572 \n",
" 1.217456 \n",
" 0.698402 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"0 0.976091 0.910572 -0.640756 0.034655\n",
"1 0.976091 0.910572 -0.150827 0.034655\n",
"2 0.976091 0.910572 0.796879 0.034655\n",
"3 0.976091 0.910572 0.849741 0.034655\n",
"4 0.976091 0.910572 0.849741 0.370709\n",
"5 0.976091 0.910572 0.849741 0.698402\n",
"6 0.976091 0.910572 1.217456 0.698402\n",
"7 0.976091 0.910572 1.217456 0.698402\n",
"8 0.976091 0.910572 1.217456 0.698402\n",
"9 0.976091 0.910572 1.217456 0.698402"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.to_parquet('bar.parquet')\n",
"ks.read_parquet('bar.parquet').head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Spark IO\n",
"\n",
"In addition, Koalas fully support Spark's various datasources such as ORC and an external datasource. See here to write it to the specified datasource and here to read it from the datasource."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" C \n",
" D \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 0.976091 \n",
" 0.910572 \n",
" -0.640756 \n",
" 0.034655 \n",
" \n",
" \n",
" 1 \n",
" 0.976091 \n",
" 0.910572 \n",
" -0.150827 \n",
" 0.034655 \n",
" \n",
" \n",
" 2 \n",
" 0.976091 \n",
" 0.910572 \n",
" 0.796879 \n",
" 0.034655 \n",
" \n",
" \n",
" 3 \n",
" 0.976091 \n",
" 0.910572 \n",
" 0.849741 \n",
" 0.034655 \n",
" \n",
" \n",
" 4 \n",
" 0.976091 \n",
" 0.910572 \n",
" 0.849741 \n",
" 0.370709 \n",
" \n",
" \n",
" 5 \n",
" 0.976091 \n",
" 0.910572 \n",
" 0.849741 \n",
" 0.698402 \n",
" \n",
" \n",
" 6 \n",
" 0.976091 \n",
" 0.910572 \n",
" 1.217456 \n",
" 0.698402 \n",
" \n",
" \n",
" 7 \n",
" 0.976091 \n",
" 0.910572 \n",
" 1.217456 \n",
" 0.698402 \n",
" \n",
" \n",
" 8 \n",
" 0.976091 \n",
" 0.910572 \n",
" 1.217456 \n",
" 0.698402 \n",
" \n",
" \n",
" 9 \n",
" 0.976091 \n",
" 0.910572 \n",
" 1.217456 \n",
" 0.698402 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"0 0.976091 0.910572 -0.640756 0.034655\n",
"1 0.976091 0.910572 -0.150827 0.034655\n",
"2 0.976091 0.910572 0.796879 0.034655\n",
"3 0.976091 0.910572 0.849741 0.034655\n",
"4 0.976091 0.910572 0.849741 0.370709\n",
"5 0.976091 0.910572 0.849741 0.698402\n",
"6 0.976091 0.910572 1.217456 0.698402\n",
"7 0.976091 0.910572 1.217456 0.698402\n",
"8 0.976091 0.910572 1.217456 0.698402\n",
"9 0.976091 0.910572 1.217456 0.698402"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kdf.to_spark_io('zoo.orc', format=\"orc\")\n",
"ks.read_spark_io('zoo.orc', format=\"orc\").head(10)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.8"
}
},
"nbformat": 4,
"nbformat_minor": 1
}