{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 10 minutes to Koalas\n", "\n", "This is a short introduction to Koalas, geared mainly for new users. This notebook shows you some key differences between pandas and Koalas. You can run this examples by yourself on a live notebook [here](https://mybinder.org/v2/gh/databricks/koalas/master?filepath=docs%2Fsource%2Fgetting_started%2F10min.ipynb). For Databricks Runtime, you can import and run [the current .ipynb file](https://raw.githubusercontent.com/databricks/koalas/master/docs/source/getting_started/10min.ipynb) out of the box. Try it on [Databricks Community Edition](https://community.cloud.databricks.com/) for free.\n", "\n", "Customarily, we import Koalas as follows:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import databricks.koalas as ks\n", "from pyspark.sql import SparkSession" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Object Creation\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Creating a Koalas Series by passing a list of values, letting Koalas create a default integer index:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "s = ks.Series([1, 3, 5, np.nan, 6, 8])" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 3.0\n", "2 5.0\n", "3 NaN\n", "4 6.0\n", "5 8.0\n", "dtype: float64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Creating a Koalas DataFrame by passing a dict of objects that can be converted to series-like." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "kdf = ks.DataFrame(\n", " {'a': [1, 2, 3, 4, 5, 6],\n", " 'b': [100, 200, 300, 400, 500, 600],\n", " 'c': [\"one\", \"two\", \"three\", \"four\", \"five\", \"six\"]},\n", " index=[10, 20, 30, 40, 50, 60])" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abc
101100one
202200two
303300three
404400four
505500five
606600six
\n", "
" ], "text/plain": [ " a b c\n", "10 1 100 one\n", "20 2 200 two\n", "30 3 300 three\n", "40 4 400 four\n", "50 5 500 five\n", "60 6 600 six" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Creating a pandas DataFrame by passing a numpy array, with a datetime index and labeled columns:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "dates = pd.date_range('20130101', periods=6)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',\n", " '2013-01-05', '2013-01-06'],\n", " dtype='datetime64[ns]', freq='D')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dates" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "pdf = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-01-0.4072910.066551-0.0731490.648219
2013-01-02-0.8487350.4372770.6326570.312861
2013-01-03-0.415537-1.7870720.2422210.125543
2013-01-04-1.6372711.1348100.2825320.133995
2013-01-05-1.230477-1.9257340.736288-0.547677
2013-01-061.092894-1.0712810.318752-0.477591
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-01 -0.407291 0.066551 -0.073149 0.648219\n", "2013-01-02 -0.848735 0.437277 0.632657 0.312861\n", "2013-01-03 -0.415537 -1.787072 0.242221 0.125543\n", "2013-01-04 -1.637271 1.134810 0.282532 0.133995\n", "2013-01-05 -1.230477 -1.925734 0.736288 -0.547677\n", "2013-01-06 1.092894 -1.071281 0.318752 -0.477591" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, this pandas DataFrame can be converted to a Koalas DataFrame" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "kdf = ks.from_pandas(pdf)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "databricks.koalas.frame.DataFrame" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(kdf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks and behaves the same as a pandas DataFrame though" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-01-0.4072910.066551-0.0731490.648219
2013-01-02-0.8487350.4372770.6326570.312861
2013-01-03-0.415537-1.7870720.2422210.125543
2013-01-04-1.6372711.1348100.2825320.133995
2013-01-05-1.230477-1.9257340.736288-0.547677
2013-01-061.092894-1.0712810.318752-0.477591
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-01 -0.407291 0.066551 -0.073149 0.648219\n", "2013-01-02 -0.848735 0.437277 0.632657 0.312861\n", "2013-01-03 -0.415537 -1.787072 0.242221 0.125543\n", "2013-01-04 -1.637271 1.134810 0.282532 0.133995\n", "2013-01-05 -1.230477 -1.925734 0.736288 -0.547677\n", "2013-01-06 1.092894 -1.071281 0.318752 -0.477591" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also, it is possible to create a Koalas DataFrame from Spark DataFrame. \n", "\n", "Creating a Spark DataFrame from pandas DataFrame" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "spark = SparkSession.builder.getOrCreate()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "sdf = spark.createDataFrame(pdf)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------------+-------------------+--------------------+-------------------+\n", "| A| B| C| D|\n", "+--------------------+-------------------+--------------------+-------------------+\n", "|-0.40729126067930577|0.06655086061836445|-0.07314878758440578| 0.6482187447085683|\n", "| -0.848735274668907|0.43727685786558224| 0.6326566086816865| 0.312860815784838|\n", "|-0.41553692955141575|-1.7870717259038067| 0.24222142308402184| 0.125543462922973|\n", "| -1.637270523583917| 1.1348099198020765| 0.2825324338895592|0.13399483028402598|\n", "| -1.2304766522352943|-1.9257342346663335| 0.7362879432261002|-0.5476765308367703|\n", "| 1.0928943198263723|-1.0712812856772376| 0.31875224896792975|-0.4775906715060247|\n", "+--------------------+-------------------+--------------------+-------------------+\n", "\n" ] } ], "source": [ "sdf.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Creating Koalas DataFrame from Spark DataFrame.\n", "`to_koalas()` is automatically attached to Spark DataFrame and available as an API when Koalas is imported." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "kdf = sdf.to_koalas()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
0-0.4072910.066551-0.0731490.648219
1-0.8487350.4372770.6326570.312861
2-0.415537-1.7870720.2422210.125543
3-1.6372711.1348100.2825320.133995
4-1.230477-1.9257340.736288-0.547677
51.092894-1.0712810.318752-0.477591
\n", "
" ], "text/plain": [ " A B C D\n", "0 -0.407291 0.066551 -0.073149 0.648219\n", "1 -0.848735 0.437277 0.632657 0.312861\n", "2 -0.415537 -1.787072 0.242221 0.125543\n", "3 -1.637271 1.134810 0.282532 0.133995\n", "4 -1.230477 -1.925734 0.736288 -0.547677\n", "5 1.092894 -1.071281 0.318752 -0.477591" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Having specific [dtypes](http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dtypes) . Types that are common to both Spark and pandas are currently supported." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A float64\n", "B float64\n", "C float64\n", "D float64\n", "dtype: object" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Viewing Data\n", "\n", "See the [API Reference](https://koalas.readthedocs.io/en/latest/reference/index.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See the top rows of the frame. The results may not be the same as pandas though: unlike pandas, the data in a Spark dataframe is not _ordered_, it has no intrinsic notion of index. When asked for the head of a dataframe, Spark will just take the requested number of rows from a partition. Do not rely on it to return specific rows, use `.loc` or `iloc` instead." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
0-0.4072910.066551-0.0731490.648219
1-0.8487350.4372770.6326570.312861
2-0.415537-1.7870720.2422210.125543
3-1.6372711.1348100.2825320.133995
4-1.230477-1.9257340.736288-0.547677
\n", "
" ], "text/plain": [ " A B C D\n", "0 -0.407291 0.066551 -0.073149 0.648219\n", "1 -0.848735 0.437277 0.632657 0.312861\n", "2 -0.415537 -1.787072 0.242221 0.125543\n", "3 -1.637271 1.134810 0.282532 0.133995\n", "4 -1.230477 -1.925734 0.736288 -0.547677" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display the index, columns, and the underlying numpy data.\n", "\n", "You can also retrieve the index; the index column can be ascribed to a DataFrame, see later" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Int64Index([0, 1, 2, 3, 4, 5], dtype='int64')" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.index" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['A', 'B', 'C', 'D'], dtype='object')" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.columns" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.40729126, 0.06655086, -0.07314879, 0.64821874],\n", " [-0.84873527, 0.43727686, 0.63265661, 0.31286082],\n", " [-0.41553693, -1.78707173, 0.24222142, 0.12554346],\n", " [-1.63727052, 1.13480992, 0.28253243, 0.13399483],\n", " [-1.23047665, -1.92573423, 0.73628794, -0.54767653],\n", " [ 1.09289432, -1.07128129, 0.31875225, -0.47759067]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.to_numpy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Describe shows a quick statistic summary of your data" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
count6.0000006.0000006.0000006.000000
mean-0.574403-0.5242420.3565500.032558
std0.9453491.2557210.2915660.463350
min-1.637271-1.925734-0.073149-0.547677
25%-1.230477-1.7870720.242221-0.477591
50%-0.848735-1.0712810.2825320.125543
75%-0.4072910.4372770.6326570.312861
max1.0928941.1348100.7362880.648219
\n", "
" ], "text/plain": [ " A B C D\n", "count 6.000000 6.000000 6.000000 6.000000\n", "mean -0.574403 -0.524242 0.356550 0.032558\n", "std 0.945349 1.255721 0.291566 0.463350\n", "min -1.637271 -1.925734 -0.073149 -0.547677\n", "25% -1.230477 -1.787072 0.242221 -0.477591\n", "50% -0.848735 -1.071281 0.282532 0.125543\n", "75% -0.407291 0.437277 0.632657 0.312861\n", "max 1.092894 1.134810 0.736288 0.648219" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Transposing your data" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345
A-0.407291-0.848735-0.415537-1.637271-1.2304771.092894
B0.0665510.437277-1.7870721.134810-1.925734-1.071281
C-0.0731490.6326570.2422210.2825320.7362880.318752
D0.6482190.3128610.1255430.133995-0.547677-0.477591
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5\n", "A -0.407291 -0.848735 -0.415537 -1.637271 -1.230477 1.092894\n", "B 0.066551 0.437277 -1.787072 1.134810 -1.925734 -1.071281\n", "C -0.073149 0.632657 0.242221 0.282532 0.736288 0.318752\n", "D 0.648219 0.312861 0.125543 0.133995 -0.547677 -0.477591" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sorting by its index" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
51.092894-1.0712810.318752-0.477591
4-1.230477-1.9257340.736288-0.547677
3-1.6372711.1348100.2825320.133995
2-0.415537-1.7870720.2422210.125543
1-0.8487350.4372770.6326570.312861
0-0.4072910.066551-0.0731490.648219
\n", "
" ], "text/plain": [ " A B C D\n", "5 1.092894 -1.071281 0.318752 -0.477591\n", "4 -1.230477 -1.925734 0.736288 -0.547677\n", "3 -1.637271 1.134810 0.282532 0.133995\n", "2 -0.415537 -1.787072 0.242221 0.125543\n", "1 -0.848735 0.437277 0.632657 0.312861\n", "0 -0.407291 0.066551 -0.073149 0.648219" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.sort_index(ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sorting by value" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
4-1.230477-1.9257340.736288-0.547677
2-0.415537-1.7870720.2422210.125543
51.092894-1.0712810.318752-0.477591
0-0.4072910.066551-0.0731490.648219
1-0.8487350.4372770.6326570.312861
3-1.6372711.1348100.2825320.133995
\n", "
" ], "text/plain": [ " A B C D\n", "4 -1.230477 -1.925734 0.736288 -0.547677\n", "2 -0.415537 -1.787072 0.242221 0.125543\n", "5 1.092894 -1.071281 0.318752 -0.477591\n", "0 -0.407291 0.066551 -0.073149 0.648219\n", "1 -0.848735 0.437277 0.632657 0.312861\n", "3 -1.637271 1.134810 0.282532 0.133995" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.sort_values(by='B')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Missing Data\n", "Koalas primarily uses the value `np.nan` to represent missing data. It is by default not included in computations. \n" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "pdf1 = pdf.reindex(index=dates[0:4], columns=list(pdf.columns) + ['E'])" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "pdf1.loc[dates[0]:dates[1], 'E'] = 1" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "kdf1 = ks.from_pandas(pdf1)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDE
2013-01-01-0.4072910.066551-0.0731490.6482191.0
2013-01-02-0.8487350.4372770.6326570.3128611.0
2013-01-03-0.415537-1.7870720.2422210.125543NaN
2013-01-04-1.6372711.1348100.2825320.133995NaN
\n", "
" ], "text/plain": [ " A B C D E\n", "2013-01-01 -0.407291 0.066551 -0.073149 0.648219 1.0\n", "2013-01-02 -0.848735 0.437277 0.632657 0.312861 1.0\n", "2013-01-03 -0.415537 -1.787072 0.242221 0.125543 NaN\n", "2013-01-04 -1.637271 1.134810 0.282532 0.133995 NaN" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To drop any rows that have missing data." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDE
2013-01-01-0.4072910.066551-0.0731490.6482191.0
2013-01-02-0.8487350.4372770.6326570.3128611.0
\n", "
" ], "text/plain": [ " A B C D E\n", "2013-01-01 -0.407291 0.066551 -0.073149 0.648219 1.0\n", "2013-01-02 -0.848735 0.437277 0.632657 0.312861 1.0" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf1.dropna(how='any')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Filling missing data." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDE
2013-01-01-0.4072910.066551-0.0731490.6482191.0
2013-01-02-0.8487350.4372770.6326570.3128611.0
2013-01-03-0.415537-1.7870720.2422210.1255435.0
2013-01-04-1.6372711.1348100.2825320.1339955.0
\n", "
" ], "text/plain": [ " A B C D E\n", "2013-01-01 -0.407291 0.066551 -0.073149 0.648219 1.0\n", "2013-01-02 -0.848735 0.437277 0.632657 0.312861 1.0\n", "2013-01-03 -0.415537 -1.787072 0.242221 0.125543 5.0\n", "2013-01-04 -1.637271 1.134810 0.282532 0.133995 5.0" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf1.fillna(value=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Operations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stats\n", "Operations in general exclude missing data.\n", "\n", "Performing a descriptive statistic:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A -0.574403\n", "B -0.524242\n", "C 0.356550\n", "D 0.032558\n", "dtype: float64" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Spark Configurations\n", "\n", "Various configurations in PySpark could be applied internally in Koalas.\n", "For example, you can enable Arrow optimization to hugely speed up internal pandas conversion. See PySpark Usage Guide for Pandas with Apache Arrow." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "prev = spark.conf.get(\"spark.sql.execution.arrow.enabled\") # Keep its default value.\n", "ks.set_option(\"compute.default_index_type\", \"distributed\") # Use default index prevent overhead.\n", "import warnings\n", "warnings.filterwarnings(\"ignore\") # Ignore warnings coming from Arrow optimizations." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "493 ms ± 157 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "spark.conf.set(\"spark.sql.execution.arrow.enabled\", True)\n", "%timeit ks.range(300000).to_pandas()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.39 s ± 109 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "spark.conf.set(\"spark.sql.execution.arrow.enabled\", False)\n", "%timeit ks.range(300000).to_pandas()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "ks.reset_option(\"compute.default_index_type\")\n", "spark.conf.set(\"spark.sql.execution.arrow.enabled\", prev) # Set its default value back." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Grouping\n", "By “group by” we are referring to a process involving one or more of the following steps:\n", "\n", "- Splitting the data into groups based on some criteria\n", "- Applying a function to each group independently\n", "- Combining the results into a data structure" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "kdf = ks.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',\n", " 'foo', 'bar', 'foo', 'foo'],\n", " 'B': ['one', 'one', 'two', 'three',\n", " 'two', 'two', 'one', 'three'],\n", " 'C': np.random.randn(8),\n", " 'D': np.random.randn(8)})" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
0fooone1.028745-0.804571
1barone0.593379-1.592110
2footwo0.0513620.466273
3barthree0.977622-0.822670
4footwo-1.105357-0.027466
5bartwo-0.0090760.977587
6fooone0.6430920.403405
7foothree-1.4511290.230347
\n", "
" ], "text/plain": [ " A B C D\n", "0 foo one 1.028745 -0.804571\n", "1 bar one 0.593379 -1.592110\n", "2 foo two 0.051362 0.466273\n", "3 bar three 0.977622 -0.822670\n", "4 foo two -1.105357 -0.027466\n", "5 bar two -0.009076 0.977587\n", "6 foo one 0.643092 0.403405\n", "7 foo three -1.451129 0.230347" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Grouping and then applying the [sum()](https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.groupby.GroupBy.sum.html#databricks.koalas.groupby.GroupBy.sum) function to the resulting groups." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CD
A
bar1.561925-1.437193
foo-0.8332860.267988
\n", "
" ], "text/plain": [ " C D\n", "A \n", "bar 1.561925 -1.437193\n", "foo -0.833286 0.267988" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.groupby('A').sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Grouping by multiple columns forms a hierarchical index, and again we can apply the sum function." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CD
AB
barone0.593379-1.592110
three0.977622-0.822670
two-0.0090760.977587
fooone1.671837-0.401166
three-1.4511290.230347
two-1.0539950.438807
\n", "
" ], "text/plain": [ " C D\n", "A B \n", "bar one 0.593379 -1.592110\n", " three 0.977622 -0.822670\n", " two -0.009076 0.977587\n", "foo one 1.671837 -0.401166\n", " three -1.451129 0.230347\n", " two -1.053995 0.438807" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.groupby(['A', 'B']).sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting\n", "See the Plotting docs." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from matplotlib import pyplot as plt" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "pser = pd.Series(np.random.randn(1000),\n", " index=pd.date_range('1/1/2000', periods=1000))" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "kser = ks.Series(pser)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "kser = kser.cummax()" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEECAYAAAA4Qc+SAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAStElEQVR4nO3dfYwd1X3G8efxC2/hxQZvwQGbrRq3DaDykpVLRP8goSGUpqFVqEpIE6CVrEapAlKqKEojUPmjEq1KpQhSZEIUgxAKAkrAhbYuJQ2owdHaMSZgyksIAeLENi/GTozD7vz6x51dlvWs78vOvXfmzPcjXe3dmeOZc/fIz549M+eMI0IAgPpbMOwKAADKQaADQCIIdABIBIEOAIkg0AEgEQQ6ACRiUbsCtg+T9F1Jh+bl74qIa2aVuVzSP0p6Jd90Q0R8/WDHXbZsWYyOjvZQZQBork2bNu2KiJGifW0DXdJ+SR+OiL22F0t61PaDEfHYrHLfioi/7rRSo6OjGh8f77Q4AECS7Rfn2tc20KM182hv/u3i/MVsJAComI7G0G0vtL1F0g5JGyJiY0GxT9jeavsu2ytKrSUAoK2OAj0iJiPiDEknSVpt+7RZRe6XNBoRvyNpg6R1Rcexvcb2uO3xnTt3zqfeAIBZurrLJSLekPSwpAtmbX81Ivbn335d0gfm+PdrI2IsIsZGRgrH9AEAPWob6LZHbC/J3x8u6SOSnp5VZvmMbz8uaVuZlQQAtNfJXS7LJa2zvVCtXwB3RsR629dKGo+I+yR93vbHJU1Iek3S5f2qMACgmIe1fO7Y2Fhw2yIweFkW2rFnf/uCqKTlSw7fFBFjRfs66aEDSMg19z2p2x6b81Zm1BiBDjTMz958SyccfZiu/P1Vw64KenDpdXPvI9CBhomQjn3PIfrk6pXDrgp6cOlB9rE4F9AwESF72LVAPxDoQMOERKAnikAHGiYiZJHoKSLQgYYJSQvI8yQR6EDDZIy5JItABxqmNeSCFBHoQAMx5JImAh1omCxCZsglSQQ60DARYsglUQQ60DAR0gJ66Eki0IGGCdFFTxWBDjRMRp4ni0AHmia4DT1VBDrQMKFgDD1RBDrQMBk99GQR6EDDsDhXugh0oGFYyiVdBDrQMK0hFxI9RQQ60DQszpUsAh1oGIZc0kWgAw3D1P90EehAw2QMuSSLQAcaJrgPPVkEOtAwrTF0Ej1FBDrQMDyCLl0EOtAwDLmki0AHGibE1P9UEehAw0RIC/ifnySaFWiYjMW5kkWgAw0TEo8sShSBDjQNM0WTRaADDcNM0XQR6EDDsDhXugh0oGFYnCtdBDrQMAy5pKttoNs+zPb3bT9u+0nbf1dQ5lDb37L9nO2Ntkf7UVkA8xch7nJJVCc99P2SPhwRp0s6Q9IFts+eVeYvJb0eEe+T9M+Sriu3mgDKxH3oaWob6NGyN/92cf6KWcUukrQuf3+XpPPMcm5AJUWEFvC/M0kdjaHbXmh7i6QdkjZExMZZRU6U9JIkRcSEpN2SjiuzogDKkbE4V7I6CvSImIyIMySdJGm17dN6OZntNbbHbY/v3Lmzl0MAmCcW50pXV3e5RMQbkh6WdMGsXa9IWiFJthdJOkbSqwX/fm1EjEXE2MjISG81BjAvLM6Vrk7uchmxvSR/f7ikj0h6elax+yRdlr+/WNJ/R8TscXYAFZCxmEuyFnVQZrmkdbYXqvUL4M6IWG/7WknjEXGfpFsk3Wb7OUmvSbqkbzUGME/BGHqi2gZ6RGyVdGbB9qtnvH9L0p+WWzUA/RBB/zxVjKQBDRNi6n+qCHSgYbJgyCVVBDrQMAy5pItABxomIsRE7jQR6EDDBDNFk0WgAw3TWmyRRE8RgQ40DItzpYtABxqGxbnSRaADDRPiomiqCHSgYbhtMV0EOtAwIdFDTxSBDjRMMFM0WQQ60DAMuaSLQAcahsW50kWgAw3D4lzpItCBhmHIJV0EOtBEdNGTRKADDTL1qF+m/qeJQAcaJMsf3c7iXGnq5CHRGKJvb3lF//PMzmFXA4mIqUAnz5NEoFfc1x5+Xi++9gstO/LQYVcFiTj5uCN0+oolw64G+oBAr7iJLNN5v328bvzUWcOuCoCKYwy94iaz0EKuYAHoAIFecRNZaBGBDqADBHrFZVloAYEOoAMEesXRQwfQKQK94hhDB9ApAr3i6KED6BSBXnGtHjrNBKA9kqLiJrJMixbSQwfQHoFecZNZ8DACAB0h0CtukjF0AB0i0Cssy0JZiLtcAHSEQK+wyXxpPHroADpBoFfYZL549UIuigLoAIFeYRMZPXQAnSPQK2xyMu+hcx86gA6QFBU2NYbOiAuATrQNdNsrbD9s+ynbT9q+sqDMubZ3296Sv67uT3WbZSLLJEkLF/J7F0B7nTyxaELSFyJis+2jJG2yvSEinppV7pGI+Fj5VWyuScbQAXShbdcvIrZHxOb8/R5J2ySd2O+KQZqYHkMn0AG019UzRW2PSjpT0saC3R+0/bikn0r6m4h4ct61K7DvV5P63o92TYddynbs2S+JHjqAznQc6LaPlHS3pKsi4s1ZuzdLOjki9tq+UNK9klYVHGONpDWStHLlyp4qfMf3f6Jr188e7Unb0iMOGXYVANRAR4Fue7FaYX57RNwze//MgI+IB2x/zfayiNg1q9xaSWslaWxsrKcu9r63JyVJ937unEb0XA9bvEC/MXLksKsBoAbaBrptS7pF0raIuH6OMidI+nlEhO3Vao3Nv1pqTXNZfqHwtPcerUXc/QEA0zrpoZ8j6dOSnrC9Jd/2ZUkrJSkibpJ0saTP2p6QtE/SJRHRl0HuPM9llpQFgHdpG+gR8aikg6ZnRNwg6YayKnUwWf57ogGjLQDQldqNWUx1++mhA8C71S/QI0SWA8CBahfoWfBINgAoUrtAj2D8HACK1C7Qs5B88Gu0ANBItQt0xtABoFj9Al1iDB0ACtQu0LOMHjoAFKlfoAc9dAAoUrtAD9FDB4Ai9Qt0eugAUKh2gZ5xlwsAFKploNNDB4AD1S7QmSkKAMVqF+it9dBJdACYrXaBHhH00AGgQA0DnbtcAKBI7QKdu1wAoFgNA50eOgAUqV2gM1MUAIrVL9BDBDoAFKhdoDOxCACK1S7QucsFAIrVLtC5ywUAitUu0COYJwoARWoX6IyhA0Cx2gU6Y+gAUKx2gc4YOgAUq2GgSybRAeAAtQt0idUWAaBI7QI9Y6YoABSqYaBzlwsAFKldoAdj6ABQqHaBnkUwsQgACtQu0HlINAAUq1+gizF0AChSu0DPMmaKAkCR+gU6q3MBQKG2gW57he2HbT9l+0nbVxaUse2v2n7O9lbbZ/WnuoyhA8BcFnVQZkLSFyJis+2jJG2yvSEinppR5g8krcpfvyvpX/KvpWuNodfuDwsA6Lu2yRgR2yNic/5+j6Rtkk6cVewiSbdGy2OSltheXnptxUxRAJhLV11d26OSzpS0cdauEyW9NOP7l3Vg6JeCmaIAUKzjQLd9pKS7JV0VEW/2cjLba2yP2x7fuXNnL4dgpigAzKGjQLe9WK0wvz0i7iko8oqkFTO+Pynf9i4RsTYixiJibGRkpJf6KpgpCgCFOrnLxZJukbQtIq6fo9h9kj6T3+1ytqTdEbG9xHpOy7jLBQAKdXKXyzmSPi3pCdtb8m1flrRSkiLiJkkPSLpQ0nOSfinpivlU6n+f36U3971duG/3vrf1a0cdOp/DA0CS2gZ6RDyqNlN5IiIkfa6MCv141y906c2zr7m+29jo0jJOBQBJ6aSHPlC//NWkJOkrf/h+nfO+ZYVlfn3ZewZZJQCohcoFehYhSVpx7BF6//Kjh1wbAKiPyk655F5zAOhO5QJ9qodOnANAdyoX6Hmea0HlagYA1Va52Hynh04fHQC6UblAzzvoLMAFAF2qXqBP9dBJdADoSgUDvfWV6f0A0J3KBXqWBzpj6ADQncoF+tSQCz10AOhO5QI9m74qOtRqAEDtVC7QQ1M9dBIdALpRvUCfHkMHAHSjsoG+gEF0AOhK5QKdtVwAoDeVC/R3ZooS6QDQjcoF+nQPnTwHgK5ULtDFRVEA6EnlAj0LblsEgF5ULtCnb1skzwGgK5ULdHroANCbygV6tC8CAChQvUCnhw4APalgoLe+kucA0J3KBXo2/YALEh0AulG5QJ9abZE8B4DuVC7QMx5BBwA9qVygT10UZa4oAHSncoE+hR46AHSncoH+zuJcJDoAdKNygR6MoQNATyoX6Nn0aoskOgB0o3KBHqyHDgA9qWCgt74S6ADQneoFuljLBQB6UblAz+ihA0BPKhfowVouANCTtoFu+xu2d9j+4Rz7z7W92/aW/HX1fCo0fR/6fA4CAA20qIMy35R0g6RbD1LmkYj4WBkVmp74Tw8dALrStoceEd+V9NoA6jJ1PkmMoQNAt8oaQ/+g7cdtP2j71PkciDF0AOhNJ0Mu7WyWdHJE7LV9oaR7Ja0qKmh7jaQ1krRy5crCgzGGDgC9mXcPPSLejIi9+fsHJC22vWyOsmsjYiwixkZGRuY4Xl4xeugA0JV5B7rtE5xfwbS9Oj/mq70eL5ueKjrfmgFAs7QdcrF9h6RzJS2z/bKkayQtlqSIuEnSxZI+a3tC0j5Jl8Q7T6noGastAkB32gZ6RHyyzf4b1LqtsRSshw4AvanwTNHh1gMA6qZygc566ADQm8oF+tRqi4y4AEB3qhforLYIAD2pYKCzHjoA9KJygZ5xGzoA9KRygc5MUQDoTeUCPWO1RQDoSeUCnfXQAaA3Zay22JNde/frG4++cMD2H/zkdXrnANCDoQX69t1v6dr1TxXuO2np4QOuDQDU39AC/ZTlR+s7V59fuO/wQxYOuDYAUH9DC/SFC6xjjlg8rNMDQHIqd1EUANAbAh0AEkGgA0AiCHQASASBDgCJINABIBEEOgAkwlPrjw/8xPYeSf/XYfFjJO0uoUy3ZYdVbpjn7sdnWSZp1xDOTfsN9pidtnOnx0zpZ1PmuX8rIo4q3BMRQ3lJGu+i7NoyynRbdljl6lDHLj9LR21d9c+SUvv16dxD+T9dk59Naec+2M+5LkMu95dUptuywyo3zHP347N0quqfJaX269cxyzx3Sj+bfpz7AMMcchmPiLGhnBwDRVs3A+08GAf7OQ+zh752iOfGYNHWzUA7D8acP+eh9dABAOWqyxh6Ldne22b/d2zzJ2rN0c7NUId2JtABIBF9D/R2v9VSZ/tc2+tnfH+D7cuHWKW+aXJb087NUPV2pocOAIkYSKDbPtL2Q7Y3237C9kX59lHb22zfbPtJ2/9pmweK1hht3Qy0czUNqof+lqQ/iYizJH1I0j/Zdr5vlaQbI+JUSW9I+sSA6jQoE3r3z/mwYVVkQJra1rQz7Tx0gwp0S/p721sl/ZekEyUdn+97ISK25O83SRodUJ0G5UVJp9g+1PYSSecNu0J91tS2pp1p56Eb1EOiPyVpRNIHIuJt2z/WO7/Z9s8oNykpiT/PbC+StD8iXrJ9p6QfSnpB0g+GW7O+a1Rb086083Br9m6DCvRjJO3IG/5Dkk4e0HmH6VRJz0tSRHxR0hdnF4iIcwdcp0FoWlvTzrSz8u3nDrhOB+hroE/9VpN0u6T7bT8haVzS0/0877DZ/itJn5d01bDrMihNbGvamXaumr5O/bd9uqSbI2J1306CSqCtm4F2rra+XRTNf6vdIekr/ToHqoG2bgbaufpYnAsAElFaD932CtsP234qn1BwZb79WNsbbD+bf12ab7ftr9p+zvZW22fNONZleflnbV9WVh1RjpLb+t9tvzFzOjWqoax2tn2G7e/lx9hq+8+G+bmS1uljkTp4bNJySWfl74+S9IykUyT9g6Qv5du/JOm6/P2Fkh5U637WsyVtzLcfK+lH+del+fulZdWTV3XaOt93nqQ/krR+2J+LV3/aWdJvSlqVv3+vpO2Slgz786X4Kq2HHhHbI2Jz/n6PpG1qTTa4SNK6vNg6SX+cv79I0q3R8pikJbaXS/qopA0R8VpEvC5pg6QLyqon5q/EtlZEPCRpzyDrj86U1c4R8UxEPJsf56eSdqh1DztK1peLorZHJZ0paaOk4yNie77rZ3pnNtmJkl6a8c9ezrfNtR0VNM+2Rk2U1c62V0s6RPk93ShX6YFu+0hJd0u6KiLenLkvWn9zcRU2EbR1M5TVzvlfZbdJuiIistIrinID3fZitRr+9oi4J9/886k/r/OvO/Ltr0haMeOfn5Rvm2s7KqSktkbFldXOto+W9G+S/jYfjkEflHmXiyXdImlbRFw/Y9d9kqbuVLlM0rdnbP9MfmX8bEm78z/j/kPS+baX5lfPz8+3oSJKbGtUWFntbPsQSf+q1vj6XQOqfjOVdXVV0u+p9afXVklb8teFko6T9JCkZ9Vale3YvLwl3ajWWNoTksZmHOsvJD2Xv64Y9pVjXn1t60ck7ZS0T60x148O+/PxKredJf25pLdnHGOLpDOG/flSfDGxCAASwSPoACARBDoAJIJAB4BEEOgAkAgCHQASQaADQCIIdABIBIEOAIn4f3X9wryKje5GAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "kser.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On a DataFrame, the plot() method is a convenience to plot all of the columns with labels:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "pdf = pd.DataFrame(np.random.randn(1000, 4), index=pser.index,\n", " columns=['A', 'B', 'C', 'D'])" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "kdf = ks.from_pandas(pdf)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "kdf = kdf.cummax()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAEECAYAAAABJn7JAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAcC0lEQVR4nO3de5SU9Z3n8fe3Lk1zae4XQRAQvASSgEBMzHB21KyGsOsaxk10Ml6SzaxxN8bJMWczmt2dOZ7MScbEzG5yzDm75BjH5LgYJWNUEjUXTdQxGkERUBBvKI1AN/cG+lZV3/2jqptG7bp0P089T1V9XufU6e6qp57n2/3TL9/6Pr/n95i7IyIi8ZWIOgARESlOiVpEJOaUqEVEYk6JWkQk5pSoRURiTolaRCTmUmHsdPLkyT5nzpwwdi0iUpc2bNiwz92nvN9roSTqOXPmsH79+jB2LSJSl8zsrcFeU+tDRCTmlKhFRGJOiVpEJOaUqEVEYk6JWkQk5pSoRURiLpTpeSIiUct1dZE9fDjqMAKhRC0idenNT6+iZ8eOqMMIhBK1iNSl3t27Gb18OS0XXxR1KOW5/PJBX1KiFpG65JkMzQsXMuGzn406lPIUSdQ6mSgidcdzOchmsXQ66lACoUQtInXHMxkALFUfTQMlahGpO97TC6CKWkQktjJ9iVoVtYhILHmvKmoRkVjr61GjHrWISDypohYRibkTsz6UqEVEYkkVtYhIzHlvoaJWohYRiSfv7QE0PU9EJL4y9VVR18c/NyI1ILNvH4fuu6//Y7mEp3dXK1A/l5DXx28hUgMOP7SO9u//IOowGkZizBjSM2ZEHUYglKhFqiR7+BAkEpz90hbMLOpwpIaoRy1SJbmOoyTGjFGSloopUYtUSe5oB8kxY6IOQ2qQErVIlWQ7jpJoaYk6DKlBJXvUZtYMPAGMKGy/1t3/PuzARGpB7tgxdl73X8geOlRy255du2he8IEqRCX1ppyTid3Ahe5+1MzSwFNm9rC7PxNybCKx19O6i+PPPUfzog+Tnjqt6LZNc+YwduWnqhSZ1JOSidrdHTha+DFdeHiYQYnUCi8sUD/5S1+i5cILI45G6lVZPWozS5rZRqAN+I27PxtuWCI1om/xnzq5sELiqaxE7e5Zd18MzATONbMPvnsbM7vWzNab2fr29vag4xSJJa+zS5Ulniqa9eHuh4DHgRXv89pqd1/m7sumTJkSVHwiseaqqKUKypn1MQXodfdDZjYSuAi4NfTIRN7l6df2cc2df6I3G59TJEv3buMfgL/40XNs+3lb1OFInSqnDJgO3GVmSfIV+L3uvi7csCrn7vy3tZt4vf1o6Y2lJu053EVTMsF/Pf/0qEPpN2XzAfgjfOajczhy2vyow5EadmOR8recWR+bgHMCjCcUmZyzdkMrsyeN4rSJo6IOR0Iwf+oYzj9rKl9cPjfqUPodye1gF/BXy+fTfNaZUYcjNezGIq/VTWMtm8t/HP7ssll8+QJVNlFyd37y8k842HUw8H13AP97Q+C7HbJTtr/GIuCn2+/m2NEJUYcjdapuEnXO84k6mdCCN1Hb2bGT29bfRtKSJKy+VylY/naGRcDaNx+g/VB9/64SnbpJ1H0VdVIrk0WuO9sNwHf+zXe4eM7FEUcTroP33ceeh/6OdZ95mPT06VGHIzXMrh48d8UvUWe6YcdTkKvsLhiJ7gwXJF5g9oF9sP3VkIKTcvQezd9dI71nC/TEZ4ZGKHZtAsDe/lfo0IJLEo74JerN98EDX674baOBO5uAjYWHRKZ3RBPMOIX0H26Fzq6owwmVvzIaGIf94q9hRJ3/oySRiV+i7u7If736ARhRfoVy4HgPn//xc1x3/jxWfvCUkIKTcmQOvgIbvk1q5W0wcUHU4YTK7/0lvLAG++IvYWRz1OFILbtl2aAvxS9R57L5rzPOgeZxZb+t53AXm/wAh8Z/CE49LaTggtPz9tt0vbw16jBC4Yf289FtOZqbj3Jk3L6owwlV165CYXHaR6CpKdpgpG7FL1F7IVFbsqK3ZftnfQQdUGVyx46R6yr9cb/1+q/QvX17FSKqvtHA1wDu/wG7Io6lGhJjx+oScglV/P7r8lz+a4XTunKFWR+JCGd99La18fon/m3/+g+lTPrPf83YSy4JOarqe273c3z7uW9z25/fxunj4nMVYVhSkydjCU3Nk/DEL1H3tT4SFVbUuejnUWf27sV7e5nwub+kaX7xi24slWLsyn9HcszoKkVXPZ1NO9i5w0jPn0fzhDOiDkek5sUvUfdX1ENtfUSXqHPHjgPQsmIFo889N7I4otaby3+iSCe09KdIEOL3ea2GK+rc8WMAJEbVX5Vcib5EnUrErw4QqUXxS9SeBQwq7DXH4crEvoo6MaqxF4XKFC5WUkUtEowYJupcxdU0nEjUiUgr6kKiHt3Yibq/9ZFUohYJQiSfTXNdXeQ6O9//xY7j0JOEg5WtvJY7dISWnmOkjx4hc3BEAFFWLrM/P2c4MbrBWx9ZtT5EglT1/5OyR47w6vkX4IXq8/1NgrUfr2i/aeBegF9BlCt95BLG/9l+JzTwdK0X2l4A1PoQCUr1E/Xhw/jx44y95BJGfvjD791g60Ow81m4+B8q2u/Og8e448kdXHnebOZPGTPoduveWMfmfZsrDbtseybAxs2rQ9t/rZg9djYjktF8shGpN9X/bJrLT78bs/zPGHfppe99/eHNsPGPcNWVFe321Tf28+A7z3D5f/goE+dPHnS7LU9sYuv+o6xbFbu7iYmIvK/qfz4vzHcedFZHLjuktkHfPOpSVyZmc1mSFc7RFhGJUtUTtef6EvUgh/ZcxZePQ3+hXnIedSaXITmEWSUiIlGJ4IxXIVEPllA9W/FViVD+lYlZz5IyzUYQkdpR/URdKH2taOuj8kSdK/PKxIxnNG1MRGpKdD3qwfrQnhtaRV3mlYmZXEY9ahGpKdH1qAn2ZGKm/8rE4ttlc1n1qEWkptRNjzpXSY9arQ8RqSGRzaMedHpehbM+Xm8/ytfXbqKtI39XlXJaH6NSjb0Wh4jUlsh61IPeEaPCk4kv7jzEhrcOMmvCKC5bMpPZk4qvs6HpeSJSa6peUZ+YRx1M6yOTze/vu59ZxKnjR5bcXtPzRKTWxPDKxMqWOe07iZgqc3lTVdQiUmsiSNSletTZinrUmULPu9xErYpaRGpN/HrUFd44oK/1kSpzSl8mpwteRKS2RDCPukRFnRtaRZ1MqvUhIvUpwh71YBV1hScTK+xRZ12r54lIbYkwUQ/yeoXT87LZChN1The8iEhtieyCF3vsm7DlfW7Hte81mHFO2bvrLXMxpj7qUYtIran+POq+ivqdDTD5LBh36skbtEyHhX9R9v6yuRyphLHujXX87JWfldz+WOaYWh8iUlMiqKj9xPcf/wosunxYu8tknWTC+PVbv+bVg6+yaMqiotufN/08LjztwmEdU0SkmiLoARSm5xlDWnf63TI5J51M0JvtZd74eay+WDeWFZH6EtmNA4Ah3XLr3TLZHMmE0ZPrIZ1ID3t/IiJxUzJTmtksM3vczF42s5fM7G+Gc0AfOOsjgJN6+Yra6M320pRsGvb+RETippySNgN8zd0XAB8DvmxmC4Z8xP5FmTyY1kehR62KWkTqVclE7e673f35wvcdwFbg1OLvKrpHoDCNOoDZF5mck0ok6Mn2qKIWkbpUUZPYzOYA5wDPvs9r15rZejNb397ePvhO+i8hJ6CTiTlSSSOTy9CUUKIWkfpTdqI2szHAz4GvuvuRd7/u7qvdfZm7L5syZcqg++nvUUNgsz6SCaMn20M6qdaHiNSfss7mmVmafJK+293/ZVhHzA04mVhG6+PBF99h96HOQV9/be9R0omEetQiUrdKJmozM+AOYKu7/9PwD1l+RX3oeA83rHmh5B4/uXAaL+V6lahFpC6VU1H/GXAVsNnMNhae+4a7/2pIR+xb68O8ZEXd0ZUB4Juf/iCXLRn8/GVzKsnH1uhkoojUp5KJ2t2fYvC17irmAy94KTGPurM3C8CEUWlGNRXfVvOoRaReRbDMaeGrASXuynKsO19Rj2oqXnnfvfVuMp5R60NE6lJ090yEkq2Pzp58RV2qmn5q11MAXDT7ouHFJiISQ9HdM7GMS8iP9Sfq4gm9K9PF0mlLOWPCGYGEKCISJ9HdMxFKzvo43lNe66Mr08XI1MhhxyYiEkdVW+b096+0cdPPN7Ps9c18GcCcy/7vs7xluwZ9T3dvea2Pzkwn01PTA4xWRCQ+qpaoN7ceZs+RLj58akv/cx8/YypnjZhW9H1TW0YwfVxz0W26sl00J4tvIyJSq6qWqPsuSPzMkpnsuTffo/7aJxfAhDnD3ndnplOtDxGpW1XrUWf7TiIOXOsjoHsXdmW6aE6pohaR+lS9ijrnnHSj8DJXzzvUdYi1r66lN9c76DadmU4lahGpW1VL1FnPr3J38q24SifqR3Y8wvef/37RbRKW4IzxmponIvWpij1qJ2HGyTe3LX34fZ37MIwNV20gWSSxJwK4/6KISBxVufVhA+ZRe8lLyAEOdh1k/IjxujxcRBpWqIna3fl/T/xPDnXuZ097BwsmdfL7LRnOBv553Fh6XroTUiOK7uP5tueZ2DwxzDBFRGIt1ET9VuvT/OOOB/I/GDABntiR42zgp+NbOPjSnWXtZ9X8VaHFKCISd6Em6u7uDgD+16mfYvOBT/C7rW383YcOs+eRH/LYpQ+QmjUvzMOLiNSFUBN1NpdfqyM5ciL70jPYnUjgIwsnBEeOC/PQIiJ1I9SpEplsDwCpRCo/6yNhJy5RLONEooiIhJ2oc/lEnUykyOacpFn/Mqf5+XkiIlJKqIm6r/WRTqTJOfkrE73vnolK1CIi5Qg1UfcWEnUqkc7Po04MqKjV+hARKUvIrY/8+hzJRLL/EnLPqfUhIlKJqrQ+UoXWx8k9alXUIiLlCHnWR19FnW992Ek96jCPLCJSP0KdR93X+kgn0mRzzqf/dD/7dzybf1E9ahGRsoSbqP1E6yPrzsK3t2CjRzH5yitJjNQdWUREyhHyycTClYnJNO7OiEw3oz/yEaZ85fowDysiUleqkqhThdZHOtODqZIWEalIdRJ1Mk3WoSnTQ6JZiVpEpBLhTs/zE4k6l82R7u0hMUqJWkSkEiFPzzvR+kj09pDAMVXUIiIVCS1R5zxHb6GiTiaaSPR05w+oHrWISEVCmZ6X9SzL71lOR08HF2/I0vb72/iPuzoBSIxsDuOQIiJ1K5SKujfbS0dPB58av5Crn8vQuXkbzZ1HaZ0xn5HnnBPGIUVE6lYoFbXjGMbKCQsZk13PiHOXcPvCa2hpTnHRPN1+S0SkEqFU1E5+4aUUgBuWTJArrJ4nIiKVCedkYmGBvBSG5+CNg93sPdKVXz1PREQqEnpF7Q7b2o6x90g3cyaPDuNwIiJ1LbQedf/O3Thl4hi2fXMFI1JaMU9EpFLhJGofUFHnIJVO05xOhnEoEZG6V7LENbMfm1mbmW2pdOcpN3BIpENdTVVEpK6V04v4Z2BFJTvta31YDtyNVEqJWkRkqEomand/AjhQyU77Wh+ezYJDMp0eWnQiIhLcrA8zu9bM1pvZ+rbDRwF44c0D+Yq6SYlaRGSoAkvU7r7a3Ze5+7LuXH6+9NZdh3GHcS1aiElEZKhCmS83Y3x+4aUb/nwe5GDaeM2fFhEZqnAmNvccAyC1exNgmHrUIiJDVs70vDXAH4GzzKzVzL5Y6j3eeRCA5Cu/zu8jqVkfIiJDVc6sj7909+nunnb3me5+R8n3jJkGQPJL/5p/IqWKWkRkqEJpfRzsOQJAqmUWAJbUpeMiIkMVSgbtyfWQtCSJXGEZvaQuHxcRGarQSt2f/fufQS4HgCWUqEVEhiq0RD2rZRZks/kfUkrUIiJDFUqiNoxR6VF4VhW1iMhwhXuWL5sBwFRRi4gMWaiJ2gs9alRRi4gMWShXoiQcjj3zDL3v7AZUUYuIDEcoiXrscXj781/o/znR0hLGYUREGkI4FXUOSKWY/ZO7sKYRNC/4QBiHERFpCKEkagMsnWbUkiVh7F5EpKGEMz3PwXT7LRGRQIQ0jxotbSoiEpBwpuepohYRCUyoPeqBent7aW1tpaurK4xDBqa5uZmZM2eS1icCEYmJcBL1+1TUra2ttLS0MGfOHMwsjMMOm7uzf/9+WltbmTt3btThiIgAIfaoSZ+cqLu6upg0aVJskzSAmTFp0qTYV/0i0ljCm/WRbnrv8zFO0n1qIUYRaSyhrfWhk4kiIsEIsaKO58m4X/ziF5gZ27ZtizoUEZGyhDePOqYV9Zo1a1i+fDlr1qyJOhQRkbI01Dzqo0eP8tRTT3HHHXdwzz33RB2OiEhZqjaPeqBbHnqJl985EugxF8wYy99fsrDoNg888AArVqzgzDPPZNKkSWzYsIGlS5cGGoeISNBC61G/e3peHKxZs4YrrrgCgCuuuELtDxGpCaFl02IVdanKNwwHDhzgscceY/PmzZgZ2WwWM+O73/2upuSJSKyFuHpevGZ9rF27lquuuoq33nqLHTt2sHPnTubOncuTTz4ZdWgiIkWFlKg9dtPz1qxZw6pVq0567rLLLlP7Q0RiL7zWR8xmfTz++OPvee6GG26IIBIRkcroxgEiIjEX3iXkMWt9iIjUqhAvIVdFLSIShBCXOVVFLSIShIa6hFxEpBaFuMypKmoRkSA01MnEZDLJ4sWLWbRoEUuWLOHpp5+OOiQRkZIaZh41wMiRI9m4cSMAjz76KDfffDN/+MMfIo5KRKS48CrqpvhV1AMdOXKECRMmRB2GiEhJ0VTUD98EezYHe8BTPgSf+seim3R2drJ48WK6urrYvXs3jz32WLAxiIiEoKyK2sxWmNkrZvaamd1U1p5j3PrYtm0bjzzyCFdffTXuHnVYIiJFlcymZpYEfghcBLQCz5nZg+7+ctH3FTuZWKLyrYbzzjuPffv20d7eztSpU6MOR0RkUOVU1OcCr7n7G+7eA9wDXFrsDZl0guazzgoivtBs27aNbDbLpEmTog5FRKSocvoTpwI7B/zcCny02Bv2Tx/FiPnzhxNXKPp61ADuzl133UUymYw4KhGR4gJrJJvZtcC1AONmjwtqt4HKZrNRhyAiUrFyWh+7gFkDfp5ZeO4k7r7a3Ze5+7KmEU1BxSci0vDKSdTPAWeY2VwzawKuAB4MNywREelTsvXh7hkzux54FEgCP3b3l4q9x9DNYkVEglJWj9rdfwX8KuRYRETkfYR2CbmIiAQjpBsHqPUhIhKUhqqo9+zZwxVXXMG8efNYunQpK1euZPv27VGHJSJSVCgLcpjFr6J2d1atWsU111zDPffcA8CLL77I3r17OfPMMyOOTkRkcPFbOSkkjz/+OOl0muuuu67/uUWLFkUYkYhIecKpqEv0qG/9061sO7At0GOePfFs/vbcvx309S1btrB06dJAjykiUg3hnEyMYetDRKRWRdL6KFb5hmXhwoWsXbu26scVERmuhpmed+GFF9Ld3c3q1av7n9u0aRNPPvlkhFGJiJTWMNPzzIz777+f3/72t8ybN4+FCxdy8803c8opp0QdmohIUQ0zPQ9gxowZ3HvvvVGHISJSkYZpfYiI1KqGaX2IiNQqTc8TEYk5tT5ERGJOrQ8RkZhTRS0iEnMN1aNOJpMsXryYhQsXsmjRIr73ve+Ry+WiDktEpKiGWT0PYOTIkWzcuBGAtrY2Pve5z3HkyBFuueWWiCMTERlcw7Y+pk6dyurVq7n99ttx96jDEREZVCQV9Z5vfYvurcEuczriA2dzyje+UdF7Tj/9dLLZLG1tbUybNi3QeEREgtJQPWoRkVoUyY0DKq18w/LGG2+QTCaZOnVq1KGIiAyqYedRt7e3c91113H99dfrE4CIxFpDrZ7X2dnJ4sWL6e3tJZVKcdVVV3HjjTdGHZaISFENNT0vm81GHYKISMVCaX20NLWEsVsRkYbUsPOoRURqRVVPJtbChSW1EKOINJaqJerm5mb2798f60To7uzfv5/m5uaoQxER6Ve1k4kzZ86ktbWV9vb2ah1ySJqbm5k5c2bUYYiI9Ktaok6n08ydO7dahxMRqRsNe8GLiEitUKIWEYk5JWoRkZizMGZhmFkH8EoZm44DDsd4uyiPXQu/y2RgX4DHrqe/TS0cu5IYyx3rIMe5km3rYfzOcvf3v1rQ3QN/AOvL3G51nLerhRgj/l3KGudy91lnf5vYH7vCGAP7f7oO/zaB7LPY3zjq1sdDMd8uymPXwu9SiXL2WU9/m1o4dtzHuZJt62n83iOs1sd6d18W+I4lVjTOjUNjHb5if+OwKurVIe1X4kXj3Dg01uEb9G8cSkUtIiLBibpHXZPM7GiJ139vZvqYWOM0zo0j7mOtRC0iEnNDTtSl/gWqd2Z2vpmtG/Dz7Wb2+QhDCk0jj7XGuXHEeaxVUYuIxNywErWZjTGz35nZ82a22cwuLTw/x8y2mtmPzOwlM/u1mY0MJmSJgsa6MWic42m4FXUXsMrdlwAXAN+zE7cgPwP4obsvBA4Blw3zWHGT4eS/X73fbaBRx1rj3BjjDDEe6+EmagO+ZWabgN8CpwLTCq+96e4bC99vAOYM81hx8xawwMxGmNl44BNRBxSyRh1rjXNjjDPEeKyHe+OAvwKmAEvdvdfMdnDiX6HuAdtlgbr4mGRmKaDb3Xea2b3AFuBN4IVoIwtdQ421xrkxxhlqY6yHm6jHAW2FAb0AmB1ATHG3EHgdwN2/Dnz93Ru4+/lVjqkaGm2sNc6NMc5QA2M9pETd9y8QcDfwkJltBtYD2wKMLXbM7DrgBuCrUcdSLY041hrnxhhnqJ2xHtIl5Ga2CPiRu58bfEgSJxrrxqBxjreKTyYW/gVaA/yP4MORONFYNwaNc/xpUSYRkZgrq6I2s1lm9riZvVyY7P43hecnmtlvzOzVwtcJhefNzH5gZq+Z2SYzWzJgX9cUtn/VzK4J59eSoQh4nB8xs0MDL8mV+AhqrM1ssZn9sbCPTWZ2eZS/V90q8xYy04Elhe9bgO3AAuA7wE2F528Cbi18vxJ4mPyczI8Bzxaenwi8Ufg6ofD9hHJveaNHuI+gxrnw2ieAS4B1Uf9eeoQ31sCZwBmF72cAu4HxUf9+9fYoq6J2993u/nzh+w5gK/mJ8JcCdxU2uwv4dOH7S4GfeN4zwHgzmw58EviNux9w94PAb4AV5cQg4QtwnHH33wEd1YxfyhfUWLv7dnd/tbCfd4A28vOwJUBDOZk4BzgHeBaY5u67Cy/t4cQVTKcCOwe8rbXw3GDPS8wMc5ylhgQ11mZ2LtBEYU6yBKeiRG1mY4CfA1919yMDX/P8Zx+dmawDGufGEdRYFz5J/RT4grvnAg+0wZWdqM0sTX5A73b3fyk8vbfvo27ha1vh+V3ArAFvn1l4brDnJSYCGmepAUGNtZmNBX4J/PdCW0QCVu6sDwPuALa6+z8NeOlBoG/mxjXAAwOev7pwpvhjwOHCx6lHgYvNbELhbPLFheckBgIcZ4m5oMbazJqA+8n3r9dWKfzGU84ZR2A5+Y9Am4CNhcdKYBLwO+BV8ittTSxsb8APyfeqNgPLBuzrPwGvFR5fiPpsqh6hjfOTQDvQSb6f+cmofz89gh9r4Eqgd8A+NgKLo/796u2hC15ERGJOt+ISEYk5JWoRkZhTohYRiTklahGRmFOiFhGJOSVqEZGYU6IWEYk5JWoRkZj7/+7KeQitM/04AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "kdf.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting data in/out\n", "See the Input/Output\n", " docs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### CSV\n", "\n", "CSV is straightforward and easy to use. See here to write a CSV file and here to read a CSV file." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
00.9760910.910572-0.6407560.034655
10.9760910.910572-0.1508270.034655
20.9760910.9105720.7968790.034655
30.9760910.9105720.8497410.034655
40.9760910.9105720.8497410.370709
50.9760910.9105720.8497410.698402
60.9760910.9105721.2174560.698402
70.9760910.9105721.2174560.698402
80.9760910.9105721.2174560.698402
90.9760910.9105721.2174560.698402
\n", "
" ], "text/plain": [ " A B C D\n", "0 0.976091 0.910572 -0.640756 0.034655\n", "1 0.976091 0.910572 -0.150827 0.034655\n", "2 0.976091 0.910572 0.796879 0.034655\n", "3 0.976091 0.910572 0.849741 0.034655\n", "4 0.976091 0.910572 0.849741 0.370709\n", "5 0.976091 0.910572 0.849741 0.698402\n", "6 0.976091 0.910572 1.217456 0.698402\n", "7 0.976091 0.910572 1.217456 0.698402\n", "8 0.976091 0.910572 1.217456 0.698402\n", "9 0.976091 0.910572 1.217456 0.698402" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.to_csv('foo.csv')\n", "ks.read_csv('foo.csv').head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parquet\n", "\n", "Parquet is an efficient and compact file format to read and write faster. See here to write a Parquet file and here to read a Parquet file." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
00.9760910.910572-0.6407560.034655
10.9760910.910572-0.1508270.034655
20.9760910.9105720.7968790.034655
30.9760910.9105720.8497410.034655
40.9760910.9105720.8497410.370709
50.9760910.9105720.8497410.698402
60.9760910.9105721.2174560.698402
70.9760910.9105721.2174560.698402
80.9760910.9105721.2174560.698402
90.9760910.9105721.2174560.698402
\n", "
" ], "text/plain": [ " A B C D\n", "0 0.976091 0.910572 -0.640756 0.034655\n", "1 0.976091 0.910572 -0.150827 0.034655\n", "2 0.976091 0.910572 0.796879 0.034655\n", "3 0.976091 0.910572 0.849741 0.034655\n", "4 0.976091 0.910572 0.849741 0.370709\n", "5 0.976091 0.910572 0.849741 0.698402\n", "6 0.976091 0.910572 1.217456 0.698402\n", "7 0.976091 0.910572 1.217456 0.698402\n", "8 0.976091 0.910572 1.217456 0.698402\n", "9 0.976091 0.910572 1.217456 0.698402" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.to_parquet('bar.parquet')\n", "ks.read_parquet('bar.parquet').head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Spark IO\n", "\n", "In addition, Koalas fully support Spark's various datasources such as ORC and an external datasource. See here to write it to the specified datasource and here to read it from the datasource." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
00.9760910.910572-0.6407560.034655
10.9760910.910572-0.1508270.034655
20.9760910.9105720.7968790.034655
30.9760910.9105720.8497410.034655
40.9760910.9105720.8497410.370709
50.9760910.9105720.8497410.698402
60.9760910.9105721.2174560.698402
70.9760910.9105721.2174560.698402
80.9760910.9105721.2174560.698402
90.9760910.9105721.2174560.698402
\n", "
" ], "text/plain": [ " A B C D\n", "0 0.976091 0.910572 -0.640756 0.034655\n", "1 0.976091 0.910572 -0.150827 0.034655\n", "2 0.976091 0.910572 0.796879 0.034655\n", "3 0.976091 0.910572 0.849741 0.034655\n", "4 0.976091 0.910572 0.849741 0.370709\n", "5 0.976091 0.910572 0.849741 0.698402\n", "6 0.976091 0.910572 1.217456 0.698402\n", "7 0.976091 0.910572 1.217456 0.698402\n", "8 0.976091 0.910572 1.217456 0.698402\n", "9 0.976091 0.910572 1.217456 0.698402" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kdf.to_spark_io('zoo.orc', format=\"orc\")\n", "ks.read_spark_io('zoo.orc', format=\"orc\").head(10)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 1 }