{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 40 kotlin-dataframe puzzles\n",
"inspired by [100 pandas puzzles](https://github.com/ajcr/100-pandas-puzzles)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Importing kotlin-dataframe\n",
"### Getting started\n",
"Difficulty: easy\n",
"\n",
"**1.** Import kotlin-dataframe"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:35.275128Z",
"start_time": "2025-03-07T10:53:30.391909Z"
}
},
"source": [
"%useLatestDescriptors\n",
"%use dataframe"
],
"outputs": [],
"execution_count": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DataFrame Basics\n",
"### A few of the fundamental routines for selecting, sorting, adding and aggregating data in DataFrames\n",
"Difficulty: easy\n",
"\n",
"Consider the following columns:\n",
"```[kotlin]\n",
"val animal by columnOf(\"cat\", \"cat\", \"snake\", \"dog\", \"dog\", \"cat\", \"snake\", \"cat\", \"dog\", \"dog\")\n",
"val age by columnOf(2.5, 3.0, 0.5, Double.NaN, 5.0, 2.0, 4.5, Double.NaN, 7, 3)\n",
"val visits by columnOf(1, 3, 2, 3, 2, 3, 1, 1, 2, 1)\n",
"val priority by columnOf(\"yes\", \"yes\", \"no\", \"yes\", \"no\", \"no\", \"no\", \"yes\", \"no\", \"no\")\n",
"```\n",
"**2.** Create a DataFrame df from this columns."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:35.762624Z",
"start_time": "2025-03-07T10:53:35.302576Z"
}
},
"source": [
"val animal by columnOf(\"cat\", \"cat\", \"snake\", \"dog\", \"dog\", \"cat\", \"snake\", \"cat\", \"dog\", \"dog\")\n",
"val age by columnOf(2.5, 3.0, 0.5, Double.NaN, 5.0, 2.0, 4.5, Double.NaN, 7.0, 3.0)\n",
"val visits by columnOf(1, 3, 2, 3, 2, 3, 1, 1, 2, 1)\n",
"val priority by columnOf(\"yes\", \"yes\", \"no\", \"yes\", \"no\", \"no\", \"no\", \"yes\", \"no\", \"no\")\n",
"\n",
"val df = dataFrameOf(animal, age, visits, priority)\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" | animal | age | visits | priority |
|---|
| cat | 2,500000 | 1 | yes |
| cat | 3,000000 | 3 | yes |
| snake | 0,500000 | 2 | no |
| dog | NaN | 3 | yes |
| dog | 5,000000 | 2 | no |
| cat | 2,000000 | 3 | no |
| snake | 4,500000 | 1 | no |
| cat | NaN | 1 | yes |
| dog | 7,000000 | 2 | no |
| dog | 3,000000 | 1 | no |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\",\"visits\",\"priority\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":10,\"ncol\":4},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"age\":2.5,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"cat\",\"age\":3.0,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"snake\",\"age\":0.5,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"dog\",\"age\":NaN,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"dog\",\"age\":5.0,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"cat\",\"age\":2.0,\"visits\":3,\"priority\":\"no\"},{\"animal\":\"snake\",\"age\":4.5,\"visits\":1,\"priority\":\"no\"},{\"animal\":\"cat\",\"age\":NaN,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"dog\",\"age\":7.0,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"dog\",\"age\":3.0,\"visits\":1,\"priority\":\"no\"}]}"
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**3.** Display a summary of the basic information about this DataFrame and its data."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:35.832482Z",
"start_time": "2025-03-07T10:53:35.806600Z"
}
},
"source": [
"df.schema()"
],
"outputs": [
{
"data": {
"text/plain": [
"animal: String\n",
"age: Double\n",
"visits: Int\n",
"priority: String"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 3
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:35.971412Z",
"start_time": "2025-03-07T10:53:35.905619Z"
}
},
"source": [
"df.describe()"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | name | type | count | unique | nulls | top | freq | mean | std | min | median | max |
|---|
| animal | String | 10 | 3 | 0 | cat | 4 | null | null | cat | dog | snake |
| age | Double | 10 | 8 | 0 | 3,000000 | 2 | NaN | NaN | 0,500000 | 3,750000 | NaN |
| visits | Int | 10 | 3 | 0 | 1 | 4 | 1,900000 | 0,875595 | 1 | 2 | 3 |
| priority | String | 10 | 2 | 0 | no | 6 | null | null | no | no | yes |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"name\",\"type\",\"count\",\"unique\",\"nulls\",\"top\",\"freq\",\"mean\",\"std\",\"min\",\"median\",\"max\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Any\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Comparable<*>\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double?\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double?\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Comparable<*>\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Comparable<*>\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Comparable<*>\"}],\"nrow\":4,\"ncol\":12},\"kotlin_dataframe\":[{\"name\":\"animal\",\"type\":\"String\",\"count\":10,\"unique\":3,\"nulls\":0,\"top\":\"cat\",\"freq\":4,\"mean\":null,\"std\":null,\"min\":\"cat\",\"median\":\"dog\",\"max\":\"snake\"},{\"name\":\"age\",\"type\":\"Double\",\"count\":10,\"unique\":8,\"nulls\":0,\"top\":\"3.0\",\"freq\":2,\"mean\":NaN,\"std\":NaN,\"min\":\"0.5\",\"median\":\"3.75\",\"max\":\"NaN\"},{\"name\":\"visits\",\"type\":\"Int\",\"count\":10,\"unique\":3,\"nulls\":0,\"top\":\"1\",\"freq\":4,\"mean\":1.9,\"std\":0.8755950357709131,\"min\":\"1\",\"median\":\"2\",\"max\":\"3\"},{\"name\":\"priority\",\"type\":\"String\",\"count\":10,\"unique\":2,\"nulls\":0,\"top\":\"no\",\"freq\":6,\"mean\":null,\"std\":null,\"min\":\"no\",\"median\":\"no\",\"max\":\"yes\"}]}"
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**4.** Return the first 3 rows of the DataFrame df."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.032031Z",
"start_time": "2025-03-07T10:53:35.974111Z"
}
},
"source": [
"df[0 ..< 3] // df[0..2]\n",
"\n",
"// or equivalently\n",
"\n",
"df.head(3)\n",
"\n",
"// or\n",
"\n",
"df.take(3)"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age | visits | priority |
|---|
| cat | 2,500000 | 1 | yes |
| cat | 3,000000 | 3 | yes |
| snake | 0,500000 | 2 | no |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\",\"visits\",\"priority\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":3,\"ncol\":4},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"age\":2.5,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"cat\",\"age\":3.0,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"snake\",\"age\":0.5,\"visits\":2,\"priority\":\"no\"}]}"
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**5.** Select \"animal\" and \"age\" columns from the DataFrame df."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.104595Z",
"start_time": "2025-03-07T10:53:36.069622Z"
}
},
"source": [
"df[animal, age]"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age |
|---|
| cat | 2,500000 |
| cat | 3,000000 |
| snake | 0,500000 |
| dog | NaN |
| dog | 5,000000 |
| cat | 2,000000 |
| snake | 4,500000 |
| cat | NaN |
| dog | 7,000000 |
| dog | 3,000000 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"}],\"nrow\":10,\"ncol\":2},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"age\":2.5},{\"animal\":\"cat\",\"age\":3.0},{\"animal\":\"snake\",\"age\":0.5},{\"animal\":\"dog\",\"age\":NaN},{\"animal\":\"dog\",\"age\":5.0},{\"animal\":\"cat\",\"age\":2.0},{\"animal\":\"snake\",\"age\":4.5},{\"animal\":\"cat\",\"age\":NaN},{\"animal\":\"dog\",\"age\":7.0},{\"animal\":\"dog\",\"age\":3.0}]}"
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**6.** Select the data in rows [3, 4, 8] and in columns [\"animal\", \"age\"]."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.153990Z",
"start_time": "2025-03-07T10:53:36.119702Z"
}
},
"source": [
"df[3, 4, 8][animal, age]"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age |
|---|
| dog | NaN |
| dog | 5,000000 |
| dog | 7,000000 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"}],\"nrow\":3,\"ncol\":2},\"kotlin_dataframe\":[{\"animal\":\"dog\",\"age\":NaN},{\"animal\":\"dog\",\"age\":5.0},{\"animal\":\"dog\",\"age\":7.0}]}"
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**7.** Select only the rows where the number of visits is grater than 2."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.297505Z",
"start_time": "2025-03-07T10:53:36.217012Z"
}
},
"source": [
"df.filter { visits > 2 }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age | visits | priority |
|---|
| cat | 3,000000 | 3 | yes |
| dog | NaN | 3 | yes |
| cat | 2,000000 | 3 | no |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\",\"visits\",\"priority\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":3,\"ncol\":4},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"age\":3.0,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"dog\",\"age\":NaN,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"cat\",\"age\":2.0,\"visits\":3,\"priority\":\"no\"}]}"
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**8.** Select the rows where the age is missing, i.e. it is NaN."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.336388Z",
"start_time": "2025-03-07T10:53:36.301283Z"
}
},
"source": [
"df.filter { age.isNaN() }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age | visits | priority |
|---|
| dog | NaN | 3 | yes |
| cat | NaN | 1 | yes |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\",\"visits\",\"priority\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":2,\"ncol\":4},\"kotlin_dataframe\":[{\"animal\":\"dog\",\"age\":NaN,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"cat\",\"age\":NaN,\"visits\":1,\"priority\":\"yes\"}]}"
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"N**9.** Select the rows where the animal is a cat and the age is less than 3."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.433863Z",
"start_time": "2025-03-07T10:53:36.393523Z"
}
},
"source": [
"df.filter { animal == \"cat\" && age < 3 }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age | visits | priority |
|---|
| cat | 2,500000 | 1 | yes |
| cat | 2,000000 | 3 | no |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\",\"visits\",\"priority\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":2,\"ncol\":4},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"age\":2.5,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"cat\",\"age\":2.0,\"visits\":3,\"priority\":\"no\"}]}"
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**10.** Select the rows where age is between 2 and 4 (inclusive)."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.499414Z",
"start_time": "2025-03-07T10:53:36.440679Z"
}
},
"source": [
"df.filter { age in 2.0..4.0 }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age | visits | priority |
|---|
| cat | 2,500000 | 1 | yes |
| cat | 3,000000 | 3 | yes |
| cat | 2,000000 | 3 | no |
| dog | 3,000000 | 1 | no |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\",\"visits\",\"priority\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":4,\"ncol\":4},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"age\":2.5,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"cat\",\"age\":3.0,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"cat\",\"age\":2.0,\"visits\":3,\"priority\":\"no\"},{\"animal\":\"dog\",\"age\":3.0,\"visits\":1,\"priority\":\"no\"}]}"
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**11.** Change tha age in row 5 to 1.5"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.615998Z",
"start_time": "2025-03-07T10:53:36.561816Z"
}
},
"source": [
"df.update { age }.at(5).with { 1.5 }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age | visits | priority |
|---|
| cat | 2,500000 | 1 | yes |
| cat | 3,000000 | 3 | yes |
| snake | 0,500000 | 2 | no |
| dog | NaN | 3 | yes |
| dog | 5,000000 | 2 | no |
| cat | 1,500000 | 3 | no |
| snake | 4,500000 | 1 | no |
| cat | NaN | 1 | yes |
| dog | 7,000000 | 2 | no |
| dog | 3,000000 | 1 | no |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\",\"visits\",\"priority\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":10,\"ncol\":4},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"age\":2.5,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"cat\",\"age\":3.0,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"snake\",\"age\":0.5,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"dog\",\"age\":NaN,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"dog\",\"age\":5.0,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"cat\",\"age\":1.5,\"visits\":3,\"priority\":\"no\"},{\"animal\":\"snake\",\"age\":4.5,\"visits\":1,\"priority\":\"no\"},{\"animal\":\"cat\",\"age\":NaN,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"dog\",\"age\":7.0,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"dog\",\"age\":3.0,\"visits\":1,\"priority\":\"no\"}]}"
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**12.** Calculate the sum of all visits in df (i.e. the total number of visits)."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.645646Z",
"start_time": "2025-03-07T10:53:36.619592Z"
}
},
"source": [
"df.visits.sum()"
],
"outputs": [
{
"data": {
"text/plain": [
"19"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**13.** Calculate the mean age for each different animal in df."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.759670Z",
"start_time": "2025-03-07T10:53:36.705568Z"
}
},
"source": [
"df.groupBy { animal }.mean { age }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age |
|---|
| cat | NaN |
| snake | 2,500000 |
| dog | NaN |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"}],\"nrow\":3,\"ncol\":2},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"age\":NaN},{\"animal\":\"snake\",\"age\":2.5},{\"animal\":\"dog\",\"age\":NaN}]}"
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**14.** Append a new row to df with your choice of values for each column. Then delete that row to return the original DataFrame."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.796707Z",
"start_time": "2025-03-07T10:53:36.762193Z"
}
},
"source": [
"val modifiedDf = df.append(\"dog\", 5.5, 2, \"no\")\n",
"modifiedDf.dropLast()"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age | visits | priority |
|---|
| cat | 2,500000 | 1 | yes |
| cat | 3,000000 | 3 | yes |
| snake | 0,500000 | 2 | no |
| dog | NaN | 3 | yes |
| dog | 5,000000 | 2 | no |
| cat | 2,000000 | 3 | no |
| snake | 4,500000 | 1 | no |
| cat | NaN | 1 | yes |
| dog | 7,000000 | 2 | no |
| dog | 3,000000 | 1 | no |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\",\"visits\",\"priority\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":10,\"ncol\":4},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"age\":2.5,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"cat\",\"age\":3.0,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"snake\",\"age\":0.5,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"dog\",\"age\":NaN,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"dog\",\"age\":5.0,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"cat\",\"age\":2.0,\"visits\":3,\"priority\":\"no\"},{\"animal\":\"snake\",\"age\":4.5,\"visits\":1,\"priority\":\"no\"},{\"animal\":\"cat\",\"age\":NaN,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"dog\",\"age\":7.0,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"dog\",\"age\":3.0,\"visits\":1,\"priority\":\"no\"}]}"
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**15.** Count the number of each type of animal in df."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.873757Z",
"start_time": "2025-03-07T10:53:36.828397Z"
}
},
"source": [
"df.groupBy { animal }.count()"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | count |
|---|
| cat | 4 |
| snake | 2 |
| dog | 4 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"count\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}],\"nrow\":3,\"ncol\":2},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"count\":4},{\"animal\":\"snake\",\"count\":2},{\"animal\":\"dog\",\"count\":4}]}"
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**16.** Sort df first by the values in the 'age' in descending order, then by the value in the 'visits' column in ascending order."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:36.935657Z",
"start_time": "2025-03-07T10:53:36.881637Z"
}
},
"source": [
"df.sortBy { age.desc() and visits }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age | visits | priority |
|---|
| cat | NaN | 1 | yes |
| dog | NaN | 3 | yes |
| dog | 7,000000 | 2 | no |
| dog | 5,000000 | 2 | no |
| snake | 4,500000 | 1 | no |
| dog | 3,000000 | 1 | no |
| cat | 3,000000 | 3 | yes |
| cat | 2,500000 | 1 | yes |
| cat | 2,000000 | 3 | no |
| snake | 0,500000 | 2 | no |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\",\"visits\",\"priority\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":10,\"ncol\":4},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"age\":NaN,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"dog\",\"age\":NaN,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"dog\",\"age\":7.0,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"dog\",\"age\":5.0,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"snake\",\"age\":4.5,\"visits\":1,\"priority\":\"no\"},{\"animal\":\"dog\",\"age\":3.0,\"visits\":1,\"priority\":\"no\"},{\"animal\":\"cat\",\"age\":3.0,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"cat\",\"age\":2.5,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"cat\",\"age\":2.0,\"visits\":3,\"priority\":\"no\"},{\"animal\":\"snake\",\"age\":0.5,\"visits\":2,\"priority\":\"no\"}]}"
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**17.** The 'priority' column contains the values 'yes' and 'no'. Replace this column with a column of boolean values: 'yes' should be True and 'no' should be False."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:37.037742Z",
"start_time": "2025-03-07T10:53:36.991538Z"
}
},
"source": [
"df.convert { priority }.with { it == \"yes\" }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age | visits | priority |
|---|
| cat | 2,500000 | 1 | true |
| cat | 3,000000 | 3 | true |
| snake | 0,500000 | 2 | false |
| dog | NaN | 3 | true |
| dog | 5,000000 | 2 | false |
| cat | 2,000000 | 3 | false |
| snake | 4,500000 | 1 | false |
| cat | NaN | 1 | true |
| dog | 7,000000 | 2 | false |
| dog | 3,000000 | 1 | false |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\",\"visits\",\"priority\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Boolean\"}],\"nrow\":10,\"ncol\":4},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"age\":2.5,\"visits\":1,\"priority\":true},{\"animal\":\"cat\",\"age\":3.0,\"visits\":3,\"priority\":true},{\"animal\":\"snake\",\"age\":0.5,\"visits\":2,\"priority\":false},{\"animal\":\"dog\",\"age\":NaN,\"visits\":3,\"priority\":true},{\"animal\":\"dog\",\"age\":5.0,\"visits\":2,\"priority\":false},{\"animal\":\"cat\",\"age\":2.0,\"visits\":3,\"priority\":false},{\"animal\":\"snake\",\"age\":4.5,\"visits\":1,\"priority\":false},{\"animal\":\"cat\",\"age\":NaN,\"visits\":1,\"priority\":true},{\"animal\":\"dog\",\"age\":7.0,\"visits\":2,\"priority\":false},{\"animal\":\"dog\",\"age\":3.0,\"visits\":1,\"priority\":false}]}"
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**18.** In the 'animal' column, change the 'dog' entries to 'corgi'."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:37.091972Z",
"start_time": "2025-03-07T10:53:37.050846Z"
}
},
"source": [
"df.update { animal }.where { it == \"dog\" }.with { \"corgi\" }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | age | visits | priority |
|---|
| cat | 2,500000 | 1 | yes |
| cat | 3,000000 | 3 | yes |
| snake | 0,500000 | 2 | no |
| corgi | NaN | 3 | yes |
| corgi | 5,000000 | 2 | no |
| cat | 2,000000 | 3 | no |
| snake | 4,500000 | 1 | no |
| cat | NaN | 1 | yes |
| corgi | 7,000000 | 2 | no |
| corgi | 3,000000 | 1 | no |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"age\",\"visits\",\"priority\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":10,\"ncol\":4},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"age\":2.5,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"cat\",\"age\":3.0,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"snake\",\"age\":0.5,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"corgi\",\"age\":NaN,\"visits\":3,\"priority\":\"yes\"},{\"animal\":\"corgi\",\"age\":5.0,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"cat\",\"age\":2.0,\"visits\":3,\"priority\":\"no\"},{\"animal\":\"snake\",\"age\":4.5,\"visits\":1,\"priority\":\"no\"},{\"animal\":\"cat\",\"age\":NaN,\"visits\":1,\"priority\":\"yes\"},{\"animal\":\"corgi\",\"age\":7.0,\"visits\":2,\"priority\":\"no\"},{\"animal\":\"corgi\",\"age\":3.0,\"visits\":1,\"priority\":\"no\"}]}"
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**19.** For each animal type and each number of visits, find the mean age. In other words, each row is an animal, each column is a number of visits and the values are the mean ages."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:37.193661Z",
"start_time": "2025-03-07T10:53:37.138201Z"
}
},
"source": [
"df.pivot { visits }.groupBy { animal }.mean(skipNA = true) { age }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | animal | visits | | |
|---|
| 1 | 3 | 2 |
|---|
| cat | 2,500000 | 2,500000 | null |
| snake | 4,500000 | null | 0,500000 |
| dog | 3,000000 | NaN | 6,000000 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"animal\",\"visits\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ColumnGroup\"}],\"nrow\":3,\"ncol\":2},\"kotlin_dataframe\":[{\"animal\":\"cat\",\"visits\":{\"data\":{\"1\":2.5,\"3\":2.5,\"2\":null},\"metadata\":{\"kind\":\"ColumnGroup\",\"columns\":[\"1\",\"3\",\"2\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double?\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double?\"}]}}},{\"animal\":\"snake\",\"visits\":{\"data\":{\"1\":4.5,\"3\":null,\"2\":0.5},\"metadata\":{\"kind\":\"ColumnGroup\",\"columns\":[\"1\",\"3\",\"2\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double?\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double?\"}]}}},{\"animal\":\"dog\",\"visits\":{\"data\":{\"1\":3.0,\"3\":NaN,\"2\":6.0},\"metadata\":{\"kind\":\"ColumnGroup\",\"columns\":[\"1\",\"3\",\"2\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double?\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double?\"}]}}}]}"
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 20
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DataFrame: beyond the basics\n",
"### Slightly trickier: you may need to combine two or more methods to get the right answer\n",
"Difficulty: medium\n",
"\n",
"The previous section was tour through some basic but essential DataFrame operations. Below are some ways that you might need to cut your data, but for which there is no single \"out of the box\" method."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**20.** You have a DataFrame df with a column 'A' of integers. For example:\n",
"```kotlin\n",
"val df = dataFrameOf(\"A\")(1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7)\n",
"```\n",
"How do you filter out rows which contain the same integer as the row immediately above?\n",
"\n",
"You should be left with a column containing the following values:\n",
"```\n",
"1, 2, 3, 4, 5, 6, 7\n",
"```"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:37.278799Z",
"start_time": "2025-03-07T10:53:37.206548Z"
}
},
"source": [
"val df = dataFrameOf(\"A\")(1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7)\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"A\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}],\"nrow\":11,\"ncol\":1},\"kotlin_dataframe\":[{\"A\":1},{\"A\":2},{\"A\":2},{\"A\":3},{\"A\":4},{\"A\":5},{\"A\":5},{\"A\":5},{\"A\":6},{\"A\":7},{\"A\":7}]}"
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 21
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:37.375317Z",
"start_time": "2025-03-07T10:53:37.332872Z"
}
},
"source": [
"df.filter { prev()?.A != A }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"A\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}],\"nrow\":7,\"ncol\":1},\"kotlin_dataframe\":[{\"A\":1},{\"A\":2},{\"A\":3},{\"A\":4},{\"A\":5},{\"A\":6},{\"A\":7}]}"
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 22
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:37.437119Z",
"start_time": "2025-03-07T10:53:37.404397Z"
}
},
"source": [
"df.filter { diffOrNull { A } != 0 }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"A\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}],\"nrow\":7,\"ncol\":1},\"kotlin_dataframe\":[{\"A\":1},{\"A\":2},{\"A\":3},{\"A\":4},{\"A\":5},{\"A\":6},{\"A\":7}]}"
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 23
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could use `distinct()` here but it won't work as desired if A is [1, 1, 2, 2, 1, 1] for example."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:37.498641Z",
"start_time": "2025-03-07T10:53:37.482235Z"
}
},
"source": [
"df.distinct()"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"A\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}],\"nrow\":7,\"ncol\":1},\"kotlin_dataframe\":[{\"A\":1},{\"A\":2},{\"A\":3},{\"A\":4},{\"A\":5},{\"A\":6},{\"A\":7}]}"
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 24
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**21.** Given a DataFrame of random numetic values:\n",
"```kotlin\n",
"val df = dataFrameOf(\"a\", \"b\", \"c\").randomDouble(5) // this is a 5x3 DataFrame of double values\n",
"```\n",
"\n",
"how do you subtract the row mean from each element in the row?"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:37.590708Z",
"start_time": "2025-03-07T10:53:37.522605Z"
}
},
"source": [
"val df = dataFrameOf(\"a\", \"b\", \"c\").randomDouble(5)\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | a | b | c |
|---|
| 0,144975 | 0,261119 | 0,528434 |
| 0,153302 | 0,580306 | 0,093150 |
| 0,508909 | 0,506442 | 0,077993 |
| 0,902918 | 0,477114 | 0,684444 |
| 0,647270 | 0,956375 | 0,962276 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"a\",\"b\",\"c\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"}],\"nrow\":5,\"ncol\":3},\"kotlin_dataframe\":[{\"a\":0.14497540832799782,\"b\":0.2611190611932894,\"c\":0.5284338371179651},{\"a\":0.15330153679836755,\"b\":0.5803055707267688,\"c\":0.09315007682405196},{\"a\":0.5089090053833354,\"b\":0.5064422067279286,\"c\":0.07799304096864379},{\"a\":0.9029178745621944,\"b\":0.47711432610080773,\"c\":0.6844443767079936},{\"a\":0.6472699280278156,\"b\":0.9563748310231902,\"c\":0.9622763313978725}]}"
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 25
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:37.680342Z",
"start_time": "2025-03-07T10:53:37.634755Z"
}
},
"source": [
"df.update { colsOf() }\n",
" .with { it - rowMean() }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | a | b | c |
|---|
| -0,166534 | -0,050390 | 0,216924 |
| -0,122284 | 0,304720 | -0,182436 |
| 0,144461 | 0,141994 | -0,286455 |
| 0,214759 | -0,211045 | -0,003714 |
| -0,208037 | 0,101068 | 0,106969 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"a\",\"b\",\"c\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"}],\"nrow\":5,\"ncol\":3},\"kotlin_dataframe\":[{\"a\":-0.16653402721841964,\"b\":-0.05039037435312804,\"c\":0.21692440157154763},{\"a\":-0.12228419131802853,\"b\":0.3047198426103727,\"c\":-0.18243565129234413},{\"a\":0.1444609210233661,\"b\":0.14199412236795933,\"c\":-0.2864550433913255},{\"a\":0.21475901543852915,\"b\":-0.21104453302285753,\"c\":-0.003714482415671627},{\"a\":-0.20803710212181048,\"b\":0.10106780087356415,\"c\":0.10696930124824644}]}"
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 26
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**22.** Suppose you have DataFrame with 10 columns of real numbers, for example:\n",
"```kotlin\n",
"val names = ('a'..'j').map { it.toString() }\n",
"val df = dataFrameOf(names).randomDouble(5)\n",
"```\n",
"\n",
"Which column of number has the smallest sum? Return that column's label."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:37.818964Z",
"start_time": "2025-03-07T10:53:37.684463Z"
}
},
"source": [
"val names = ('a'..'j').map { it.toString() }\n",
"val df = dataFrameOf(names).randomDouble(5)\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | a | b | c | d | e | f | g | h | i | j |
|---|
| 0,649356 | 0,397835 | 0,063387 | 0,798222 | 0,113533 | 0,226745 | 0,770447 | 0,354243 | 0,463067 | 0,650795 |
| 0,751857 | 0,947548 | 0,724005 | 0,814034 | 0,518926 | 0,571749 | 0,297992 | 0,569584 | 0,307910 | 0,613237 |
| 0,859521 | 0,074963 | 0,924321 | 0,515546 | 0,324357 | 0,275093 | 0,199628 | 0,558562 | 0,129252 | 0,313497 |
| 0,080708 | 0,319833 | 0,482264 | 0,061206 | 0,622684 | 0,103254 | 0,096697 | 0,721548 | 0,610209 | 0,069625 |
| 0,170174 | 0,832971 | 0,315927 | 0,820261 | 0,467749 | 0,660628 | 0,763705 | 0,109415 | 0,948070 | 0,625567 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"a\",\"b\",\"c\",\"d\",\"e\",\"f\",\"g\",\"h\",\"i\",\"j\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"}],\"nrow\":5,\"ncol\":10},\"kotlin_dataframe\":[{\"a\":0.6493563710786623,\"b\":0.397835369604407,\"c\":0.06338731572139988,\"d\":0.7982218005967107,\"e\":0.11353251481510551,\"f\":0.22674494149283497,\"g\":0.7704465004752168,\"h\":0.35424286328842647,\"i\":0.4630670078286009,\"j\":0.6507953796285432},{\"a\":0.7518572074244319,\"b\":0.9475476868012341,\"c\":0.7240053071670721,\"d\":0.8140339557353566,\"e\":0.5189263586641782,\"f\":0.5717485260534912,\"g\":0.29799209778715297,\"h\":0.5695842607647181,\"i\":0.3079102336551607,\"j\":0.6132367770204242},{\"a\":0.8595214730875305,\"b\":0.07496277086070569,\"c\":0.924320574443025,\"d\":0.5155463658745273,\"e\":0.3243573459052579,\"f\":0.27509307858624155,\"g\":0.1996277745207944,\"h\":0.5585622646514512,\"i\":0.1292518636539014,\"j\":0.31349744743929064},{\"a\":0.08070773788993102,\"b\":0.3198334379507849,\"c\":0.4822639342700412,\"d\":0.061206404774003076,\"e\":0.6226838134323756,\"f\":0.10325422898517467,\"g\":0.09669749554704232,\"h\":0.721547746781427,\"i\":0.6102089314756252,\"j\":0.06962510331827465},{\"a\":0.1701742099377852,\"b\":0.8329712045956171,\"c\":0.3159265062249631,\"d\":0.8202614005197886,\"e\":0.46774888154961536,\"f\":0.6606278717430324,\"g\":0.7637053542055305,\"h\":0.1094150681618915,\"i\":0.9480700355994607,\"j\":0.6255673978906097}]}"
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 27
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:37.911207Z",
"start_time": "2025-03-07T10:53:37.869335Z"
}
},
"source": [
"df.sum().transpose().minBy(\"value\")[\"name\"]"
],
"outputs": [
{
"data": {
"text/plain": [
"f"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 28
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**23.** How do you count how many unique rows a DataFrame has (i.e. ignore all rows that are duplicates)?"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:37.997337Z",
"start_time": "2025-03-07T10:53:37.921150Z"
}
},
"source": [
"val df = dataFrameOf(\"a\", \"b\", \"c\").randomInt(30, 0..2)\n",
"df.distinct().count()"
],
"outputs": [
{
"data": {
"text/plain": [
"21"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 29
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**24.** In the cell below, you have a DataFrame `df` that consists of 10 columns of floating-point numbers. Exactly 5 entries in each row are NaN values.\n",
"\n",
"For each row of the DataFrame, find the *column* which contains the *third* NaN value.\n",
"\n",
"You should return a column of column labels: `e, c, d, h, d`"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:38.115519Z",
"start_time": "2025-03-07T10:53:38.020155Z"
}
},
"source": [
"val nan = Double.NaN\n",
"val names = ('a'..'j').map { it.toString() }\n",
"val data = listOf(\n",
" 0.04, nan, nan, 0.25, nan, 0.43, 0.71, 0.51, nan, nan,\n",
" nan, nan, nan, 0.04, 0.76, nan, nan, 0.67, 0.76, 0.16,\n",
" nan, nan, 0.5, nan, 0.31, 0.4, nan, nan, 0.24, 0.01,\n",
" 0.49, nan, nan, 0.62, 0.73, 0.26, 0.85, nan, nan, nan,\n",
" nan, nan, 0.41, nan, 0.05, nan, 0.61, nan, 0.48, 0.68,\n",
")\n",
"val df = dataFrameOf(names)(*data.toTypedArray())\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | a | b | c | d | e | f | g | h | i | j |
|---|
| 0,040000 | NaN | NaN | 0,250000 | NaN | 0,430000 | 0,710000 | 0,510000 | NaN | NaN |
| NaN | NaN | NaN | 0,040000 | 0,760000 | NaN | NaN | 0,670000 | 0,760000 | 0,160000 |
| NaN | NaN | 0,500000 | NaN | 0,310000 | 0,400000 | NaN | NaN | 0,240000 | 0,010000 |
| 0,490000 | NaN | NaN | 0,620000 | 0,730000 | 0,260000 | 0,850000 | NaN | NaN | NaN |
| NaN | NaN | 0,410000 | NaN | 0,050000 | NaN | 0,610000 | NaN | 0,480000 | 0,680000 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"a\",\"b\",\"c\",\"d\",\"e\",\"f\",\"g\",\"h\",\"i\",\"j\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"}],\"nrow\":5,\"ncol\":10},\"kotlin_dataframe\":[{\"a\":0.04,\"b\":NaN,\"c\":NaN,\"d\":0.25,\"e\":NaN,\"f\":0.43,\"g\":0.71,\"h\":0.51,\"i\":NaN,\"j\":NaN},{\"a\":NaN,\"b\":NaN,\"c\":NaN,\"d\":0.04,\"e\":0.76,\"f\":NaN,\"g\":NaN,\"h\":0.67,\"i\":0.76,\"j\":0.16},{\"a\":NaN,\"b\":NaN,\"c\":0.5,\"d\":NaN,\"e\":0.31,\"f\":0.4,\"g\":NaN,\"h\":NaN,\"i\":0.24,\"j\":0.01},{\"a\":0.49,\"b\":NaN,\"c\":NaN,\"d\":0.62,\"e\":0.73,\"f\":0.26,\"g\":0.85,\"h\":NaN,\"i\":NaN,\"j\":NaN},{\"a\":NaN,\"b\":NaN,\"c\":0.41,\"d\":NaN,\"e\":0.05,\"f\":NaN,\"g\":0.61,\"h\":NaN,\"i\":0.48,\"j\":0.68}]}"
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 30
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:38.167323Z",
"start_time": "2025-03-07T10:53:38.118278Z"
}
},
"source": [
"df.mapToColumn(\"res\") { \n",
" namedValuesOf()\n",
" .filter { it.value.isNaN() }.drop(2)\n",
" .firstOrNull()?.name \n",
"}"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"res\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":5,\"ncol\":1},\"kotlin_dataframe\":[{\"res\":\"e\"},{\"res\":\"c\"},{\"res\":\"d\"},{\"res\":\"h\"},{\"res\":\"d\"}]}"
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 31
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**25.** A DataFrame has a column of groups 'grps' and and column of integer values 'vals':\n",
"```kotlin\n",
"val grps by column(\"a\", \"a\", \"a\", \"b\", \"b\", \"c\", \"a\", \"a\", \"b\", \"c\", \"c\", \"c\", \"b\", \"b\", \"c\")\n",
"val vals by column(12, 345, 3, 1, 45, 14, 4, 52, 54, 23, 235, 21, 57, 3, 87)\n",
"\n",
"val df = dataFrameOf(grps, vals)\n",
"```\n",
"\n",
"For each group, find the sum of the three greatest values. You should end up with the answer as follows:\n",
"```\n",
"grps\n",
"a 409\n",
"b 156\n",
"c 345\n",
"```"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:38.308071Z",
"start_time": "2025-03-07T10:53:38.189647Z"
}
},
"source": [
"val grps by columnOf(\"a\", \"a\", \"a\", \"b\", \"b\", \"c\", \"a\", \"a\", \"b\", \"c\", \"c\", \"c\", \"b\", \"b\", \"c\")\n",
"val vals by columnOf(12, 345, 3, 1, 45, 14, 4, 52, 54, 23, 235, 21, 57, 3, 87)\n",
"\n",
"val df = dataFrameOf(grps, vals)\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | grps | vals |
|---|
| a | 12 |
| a | 345 |
| a | 3 |
| b | 1 |
| b | 45 |
| c | 14 |
| a | 4 |
| a | 52 |
| b | 54 |
| c | 23 |
| c | 235 |
| c | 21 |
| b | 57 |
| b | 3 |
| c | 87 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"grps\",\"vals\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}],\"nrow\":15,\"ncol\":2},\"kotlin_dataframe\":[{\"grps\":\"a\",\"vals\":12},{\"grps\":\"a\",\"vals\":345},{\"grps\":\"a\",\"vals\":3},{\"grps\":\"b\",\"vals\":1},{\"grps\":\"b\",\"vals\":45},{\"grps\":\"c\",\"vals\":14},{\"grps\":\"a\",\"vals\":4},{\"grps\":\"a\",\"vals\":52},{\"grps\":\"b\",\"vals\":54},{\"grps\":\"c\",\"vals\":23},{\"grps\":\"c\",\"vals\":235},{\"grps\":\"c\",\"vals\":21},{\"grps\":\"b\",\"vals\":57},{\"grps\":\"b\",\"vals\":3},{\"grps\":\"c\",\"vals\":87}]}"
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 32
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:38.354096Z",
"start_time": "2025-03-07T10:53:38.312059Z"
}
},
"source": [
"df.groupBy { grps }.aggregate { \n",
" vals.sortDesc().take(3).sum() into \"res\"\n",
"}"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"grps\",\"res\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}],\"nrow\":3,\"ncol\":2},\"kotlin_dataframe\":[{\"grps\":\"a\",\"res\":409},{\"grps\":\"b\",\"res\":156},{\"grps\":\"c\",\"res\":345}]}"
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 33
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**26.** The DataFrame `df` constructed below has two integer columns 'A' and 'B'. The values in 'A' are between 1 and 100 (inclusive).\n",
"\n",
"For each group of 10 consecutive integers in 'A' (i.e. `(0, 10]`, `(10, 20]`, ...), calculate the sum of the corresponding values in column 'B'.\n",
"\n",
"The answer as follows:\n",
"\n",
"```\n",
"A\n",
"(0, 10] 635\n",
"(10, 20] 360\n",
"(20, 30] 315\n",
"(30, 40] 306\n",
"(40, 50] 750\n",
"(50, 60] 284\n",
"(60, 70] 424\n",
"(70, 80] 526\n",
"(80, 90] 835\n",
"(90, 100] 852\n",
"```"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:38.469189Z",
"start_time": "2025-03-07T10:53:38.395243Z"
}
},
"source": [
"import kotlin.random.Random\n",
"\n",
"val random = Random(42)\n",
"val list = List(200) { random.nextInt(1, 101) }\n",
"val df = dataFrameOf(\"A\", \"B\")(*list.toTypedArray())\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | A | B |
|---|
| 34 | 41 |
| 42 | 3 |
| 42 | 33 |
| 22 | 41 |
| 70 | 88 |
| 53 | 68 |
| 80 | 4 |
| 59 | 59 |
| 45 | 1 |
| 27 | 14 |
| 70 | 8 |
| 11 | 52 |
| 51 | 60 |
| 46 | 43 |
| 17 | 17 |
| 17 | 42 |
| 56 | 29 |
| 58 | 49 |
| 48 | 7 |
| 73 | 52 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"A\",\"B\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}],\"nrow\":100,\"ncol\":2},\"kotlin_dataframe\":[{\"A\":34,\"B\":41},{\"A\":42,\"B\":3},{\"A\":42,\"B\":33},{\"A\":22,\"B\":41},{\"A\":70,\"B\":88},{\"A\":53,\"B\":68},{\"A\":80,\"B\":4},{\"A\":59,\"B\":59},{\"A\":45,\"B\":1},{\"A\":27,\"B\":14},{\"A\":70,\"B\":8},{\"A\":11,\"B\":52},{\"A\":51,\"B\":60},{\"A\":46,\"B\":43},{\"A\":17,\"B\":17},{\"A\":17,\"B\":42},{\"A\":56,\"B\":29},{\"A\":58,\"B\":49},{\"A\":48,\"B\":7},{\"A\":73,\"B\":52}]}"
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 34
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:38.538736Z",
"start_time": "2025-03-07T10:53:38.475641Z"
}
},
"source": [
"df.groupBy { A.map { (it - 1) / 10 } }.sum { B }\n",
" .sortBy { A }\n",
" .convert { A }.with { \"(${it * 10}, ${it * 10 + 10}]\" }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | A | B |
|---|
| (0, 10] | 353 |
| (10, 20] | 873 |
| (20, 30] | 321 |
| (30, 40] | 322 |
| (40, 50] | 432 |
| (50, 60] | 754 |
| (60, 70] | 405 |
| (70, 80] | 561 |
| (80, 90] | 657 |
| (90, 100] | 527 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"A\",\"B\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}],\"nrow\":10,\"ncol\":2},\"kotlin_dataframe\":[{\"A\":\"(0, 10]\",\"B\":353},{\"A\":\"(10, 20]\",\"B\":873},{\"A\":\"(20, 30]\",\"B\":321},{\"A\":\"(30, 40]\",\"B\":322},{\"A\":\"(40, 50]\",\"B\":432},{\"A\":\"(50, 60]\",\"B\":754},{\"A\":\"(60, 70]\",\"B\":405},{\"A\":\"(70, 80]\",\"B\":561},{\"A\":\"(80, 90]\",\"B\":657},{\"A\":\"(90, 100]\",\"B\":527}]}"
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 35
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DataFrames: harder problems\n",
"\n",
"### These might require a bit of thinking outside the box...\n",
"\n",
"Difficulty: hard"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**27.** Consider a DataFrame `df` where there is an integer column 'X':\n",
"```kotlin\n",
"val df = dataFrameOf(\"X\")(7, 2, 0, 3, 4, 2, 5, 0, 3 , 4)\n",
"```\n",
"For each value, count the difference back to the previous zero (or the start of the column, whichever is closer). These values should therefore be\n",
"\n",
"```\n",
"[1, 2, 0, 1, 2, 3, 4, 0, 1, 2]\n",
"```\n",
"\n",
"Make this a new column 'Y'."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:38.636595Z",
"start_time": "2025-03-07T10:53:38.583434Z"
}
},
"source": [
"val df = dataFrameOf(\"X\")(7, 2, 0, 3, 4, 2, 5, 0, 3, 4)\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"X\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}],\"nrow\":10,\"ncol\":1},\"kotlin_dataframe\":[{\"X\":7},{\"X\":2},{\"X\":0},{\"X\":3},{\"X\":4},{\"X\":2},{\"X\":5},{\"X\":0},{\"X\":3},{\"X\":4}]}"
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 36
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:38.674903Z",
"start_time": "2025-03-07T10:53:38.641118Z"
}
},
"source": [
"df.mapToColumn(\"Y\") {\n",
" if (it.X == 0) 0 else (prev()?.newValue() ?: 0) + 1\n",
"}"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"Y\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}],\"nrow\":10,\"ncol\":1},\"kotlin_dataframe\":[{\"Y\":1},{\"Y\":2},{\"Y\":0},{\"Y\":1},{\"Y\":2},{\"Y\":3},{\"Y\":4},{\"Y\":0},{\"Y\":1},{\"Y\":2}]}"
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 37
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**28.** Consider the DataFrame constructed below which contains rows and columns of numerical data.\n",
"\n",
"Create a list of the column-row index locations of the 3 largest values in this DataFrame. In thi case, the answer should be:\n",
"```\n",
"[(0, d), (2, c), (3, f)]\n",
"```"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:38.824081Z",
"start_time": "2025-03-07T10:53:38.717585Z"
}
},
"source": [
"val names = ('a'..'h').map { it.toString() } // val names = (0..7).map { it.toString() }\n",
"val random = Random(30)\n",
"val list = List(64) { random.nextInt(1, 101) }\n",
"val df = dataFrameOf(names)(*list.toTypedArray())\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | a | b | c | d | e | f | g | h |
|---|
| 43 | 88 | 66 | 100 | 9 | 59 | 74 | 23 |
| 6 | 63 | 43 | 58 | 4 | 85 | 9 | 25 |
| 49 | 59 | 100 | 52 | 28 | 1 | 19 | 81 |
| 92 | 41 | 13 | 57 | 28 | 97 | 63 | 39 |
| 4 | 59 | 72 | 65 | 50 | 35 | 14 | 31 |
| 55 | 74 | 33 | 66 | 17 | 39 | 80 | 38 |
| 18 | 64 | 91 | 39 | 80 | 55 | 65 | 2 |
| 19 | 76 | 75 | 18 | 32 | 97 | 1 | 32 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"a\",\"b\",\"c\",\"d\",\"e\",\"f\",\"g\",\"h\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}],\"nrow\":8,\"ncol\":8},\"kotlin_dataframe\":[{\"a\":43,\"b\":88,\"c\":66,\"d\":100,\"e\":9,\"f\":59,\"g\":74,\"h\":23},{\"a\":6,\"b\":63,\"c\":43,\"d\":58,\"e\":4,\"f\":85,\"g\":9,\"h\":25},{\"a\":49,\"b\":59,\"c\":100,\"d\":52,\"e\":28,\"f\":1,\"g\":19,\"h\":81},{\"a\":92,\"b\":41,\"c\":13,\"d\":57,\"e\":28,\"f\":97,\"g\":63,\"h\":39},{\"a\":4,\"b\":59,\"c\":72,\"d\":65,\"e\":50,\"f\":35,\"g\":14,\"h\":31},{\"a\":55,\"b\":74,\"c\":33,\"d\":66,\"e\":17,\"f\":39,\"g\":80,\"h\":38},{\"a\":18,\"b\":64,\"c\":91,\"d\":39,\"e\":80,\"f\":55,\"g\":65,\"h\":2},{\"a\":19,\"b\":76,\"c\":75,\"d\":18,\"e\":32,\"f\":97,\"g\":1,\"h\":32}]}"
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 38
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:38.885757Z",
"start_time": "2025-03-07T10:53:38.828604Z"
}
},
"source": [
"df.add(\"index\") { index() }\n",
" .gather { dropLast() }.into(\"name\", \"vals\")\n",
" .sortByDesc(\"vals\").take(3)[\"index\", \"name\"]"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"index\",\"name\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":3,\"ncol\":2},\"kotlin_dataframe\":[{\"index\":0,\"name\":\"d\"},{\"index\":2,\"name\":\"c\"},{\"index\":3,\"name\":\"f\"}]}"
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 39
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**29.** You are given the DataFrame below with a column of group IDs, 'grps', and a column of corresponding integer values, 'vals'.\n",
"\n",
"```kotlin\n",
"val random = Random(31)\n",
"val lab = listOf(\"A\", \"B\")\n",
"\n",
"val vals by columnOf(List(15) { random.nextInt(-30, 30) })\n",
"val grps by columnOf(List(15) { lab[random.nextInt(0, 2)] })\n",
"\n",
"val df = dataFrameOf(vals, grps)\n",
"```\n",
"\n",
"Create a new column 'patched_values' which contains the same values as the 'vals' any negative values in 'vals' with the group mean:\n",
"\n",
"```\n",
"vals grps patched_vals\n",
" -17 B 21.0\n",
" -7 B 21.0\n",
" 28 B 28.0\n",
" 16 B 16.0\n",
" -21 B 21.0\n",
" 19 B 19.0\n",
" -2 B 21.0\n",
" -19 B 21.0\n",
" 16 A 16.0\n",
" 9 A 9.0\n",
" -14 A 16.0\n",
" -19 A 16.0\n",
" -22 A 16.0\n",
" -1 A 16.0\n",
" 23 A 23.0\n",
"```"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:39.036710Z",
"start_time": "2025-03-07T10:53:38.944187Z"
}
},
"source": [
"val random = Random(31)\n",
"val lab = listOf(\"A\", \"B\")\n",
"\n",
"val vals by columnOf(*Array(15) { random.nextInt(-30, 30) })\n",
"val grps by columnOf(*Array(15) { lab[random.nextInt(0, 2)] })\n",
"\n",
"val df = dataFrameOf(vals, grps)\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | vals | grps |
|---|
| -17 | B |
| -7 | B |
| 16 | A |
| 28 | B |
| 9 | A |
| 16 | B |
| -21 | B |
| -14 | A |
| -19 | A |
| -22 | A |
| 19 | B |
| -2 | B |
| -1 | A |
| -19 | B |
| 23 | A |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"vals\",\"grps\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":15,\"ncol\":2},\"kotlin_dataframe\":[{\"vals\":-17,\"grps\":\"B\"},{\"vals\":-7,\"grps\":\"B\"},{\"vals\":16,\"grps\":\"A\"},{\"vals\":28,\"grps\":\"B\"},{\"vals\":9,\"grps\":\"A\"},{\"vals\":16,\"grps\":\"B\"},{\"vals\":-21,\"grps\":\"B\"},{\"vals\":-14,\"grps\":\"A\"},{\"vals\":-19,\"grps\":\"A\"},{\"vals\":-22,\"grps\":\"A\"},{\"vals\":19,\"grps\":\"B\"},{\"vals\":-2,\"grps\":\"B\"},{\"vals\":-1,\"grps\":\"A\"},{\"vals\":-19,\"grps\":\"B\"},{\"vals\":23,\"grps\":\"A\"}]}"
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 40
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:39.139828Z",
"start_time": "2025-03-07T10:53:39.041221Z"
}
},
"source": [
"val means = df.filter { vals >= 0 }\n",
" .groupBy { grps }.mean()\n",
" .pivot { grps }.values { vals }\n",
"\n",
"df.add(\"patched_values\") {\n",
" if (vals < 0) means[grps] else vals.toDouble()\n",
"}"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | vals | grps | patched_values |
|---|
| -17 | B | 21,000000 |
| -7 | B | 21,000000 |
| 16 | A | 16,000000 |
| 28 | B | 28,000000 |
| 9 | A | 9,000000 |
| 16 | B | 16,000000 |
| -21 | B | 21,000000 |
| -14 | A | 16,000000 |
| -19 | A | 16,000000 |
| -22 | A | 16,000000 |
| 19 | B | 19,000000 |
| -2 | B | 21,000000 |
| -1 | A | 16,000000 |
| -19 | B | 21,000000 |
| 23 | A | 23,000000 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"vals\",\"grps\",\"patched_values\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Any\"}],\"nrow\":15,\"ncol\":3},\"kotlin_dataframe\":[{\"vals\":-17,\"grps\":\"B\",\"patched_values\":\"21.0\"},{\"vals\":-7,\"grps\":\"B\",\"patched_values\":\"21.0\"},{\"vals\":16,\"grps\":\"A\",\"patched_values\":\"16.0\"},{\"vals\":28,\"grps\":\"B\",\"patched_values\":\"28.0\"},{\"vals\":9,\"grps\":\"A\",\"patched_values\":\"9.0\"},{\"vals\":16,\"grps\":\"B\",\"patched_values\":\"16.0\"},{\"vals\":-21,\"grps\":\"B\",\"patched_values\":\"21.0\"},{\"vals\":-14,\"grps\":\"A\",\"patched_values\":\"16.0\"},{\"vals\":-19,\"grps\":\"A\",\"patched_values\":\"16.0\"},{\"vals\":-22,\"grps\":\"A\",\"patched_values\":\"16.0\"},{\"vals\":19,\"grps\":\"B\",\"patched_values\":\"19.0\"},{\"vals\":-2,\"grps\":\"B\",\"patched_values\":\"21.0\"},{\"vals\":-1,\"grps\":\"A\",\"patched_values\":\"16.0\"},{\"vals\":-19,\"grps\":\"B\",\"patched_values\":\"21.0\"},{\"vals\":23,\"grps\":\"A\",\"patched_values\":\"23.0\"}]}"
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 41
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**30.** Implement a rolling mean over groups with window size 3, which ignores NaN value. For example consider the following DataFrame:\n",
"```kotlin\n",
"val group by columnOf(\"a\", \"a\", \"b\", \"b\", \"a\", \"b\", \"b\", \"b\", \"a\", \"b\", \"a\", \"b\")\n",
"val value by columnOf(1.0, 2.0, 3.0, Double.NaN, 2.0, 3.0, Double.NaN, 1.0, 7.0, 3.0, Double.NaN, 8.0)\n",
"\n",
"val df = dataFrameOf(group, value)\n",
"df\n",
"\n",
"group value\n",
"a 1.0\n",
"a 2.0\n",
"b 3.0\n",
"b NaN\n",
"a 2.0\n",
"b 3.0\n",
"b NaN\n",
"b 1.0\n",
"a 7.0\n",
"b 3.0\n",
"a NaN\n",
"b 8.0\n",
"```\n",
"The goal is:\n",
"```\n",
"1.000000\n",
"1.500000\n",
"3.000000\n",
"3.000000\n",
"1.666667\n",
"3.000000\n",
"3.000000\n",
"2.000000\n",
"3.666667\n",
"2.000000\n",
"4.500000\n",
"4.000000\n",
"```\n",
"E.g. the first window of size three for group 'b' has values 3.0, NaN and 3.0 and occurs at row index 5. Instead of being NaN the value in the new column at this row index should be 3.0 (just the two non-NaN values are used to compute the mean (3+3)/2)"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:39.273532Z",
"start_time": "2025-03-07T10:53:39.184331Z"
}
},
"source": [
"val groups by columnOf(\"a\", \"a\", \"b\", \"b\", \"a\", \"b\", \"b\", \"b\", \"a\", \"b\", \"a\", \"b\")\n",
"val value by columnOf(1.0, 2.0, 3.0, Double.NaN, 2.0, 3.0, Double.NaN, 1.0, 7.0, 3.0, Double.NaN, 8.0)\n",
"\n",
"val df = dataFrameOf(groups, value)\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | groups | value |
|---|
| a | 1,000000 |
| a | 2,000000 |
| b | 3,000000 |
| b | NaN |
| a | 2,000000 |
| b | 3,000000 |
| b | NaN |
| b | 1,000000 |
| a | 7,000000 |
| b | 3,000000 |
| a | NaN |
| b | 8,000000 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"groups\",\"value\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"}],\"nrow\":12,\"ncol\":2},\"kotlin_dataframe\":[{\"groups\":\"a\",\"value\":1.0},{\"groups\":\"a\",\"value\":2.0},{\"groups\":\"b\",\"value\":3.0},{\"groups\":\"b\",\"value\":NaN},{\"groups\":\"a\",\"value\":2.0},{\"groups\":\"b\",\"value\":3.0},{\"groups\":\"b\",\"value\":NaN},{\"groups\":\"b\",\"value\":1.0},{\"groups\":\"a\",\"value\":7.0},{\"groups\":\"b\",\"value\":3.0},{\"groups\":\"a\",\"value\":NaN},{\"groups\":\"b\",\"value\":8.0}]}"
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 42
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:39.348878Z",
"start_time": "2025-03-07T10:53:39.280836Z"
}
},
"source": [
"df.add(\"id\") { index() }\n",
" .groupBy { groups }.add(\"res\") {\n",
" relative(-2..0).value.filter { !it.isNaN() }.mean()\n",
" }.concat()\n",
" .sortBy(\"id\")\n",
" .remove(\"id\")"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | groups | value | res |
|---|
| a | 1,000000 | 1,000000 |
| a | 2,000000 | 1,500000 |
| b | 3,000000 | 3,000000 |
| b | NaN | 3,000000 |
| a | 2,000000 | 1,666667 |
| b | 3,000000 | 3,000000 |
| b | NaN | 3,000000 |
| b | 1,000000 | 2,000000 |
| a | 7,000000 | 3,666667 |
| b | 3,000000 | 2,000000 |
| a | NaN | 4,500000 |
| b | 8,000000 | 4,000000 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"groups\",\"value\",\"res\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"}],\"nrow\":12,\"ncol\":3},\"kotlin_dataframe\":[{\"groups\":\"a\",\"value\":1.0,\"res\":1.0},{\"groups\":\"a\",\"value\":2.0,\"res\":1.5},{\"groups\":\"b\",\"value\":3.0,\"res\":3.0},{\"groups\":\"b\",\"value\":NaN,\"res\":3.0},{\"groups\":\"a\",\"value\":2.0,\"res\":1.6666666666666667},{\"groups\":\"b\",\"value\":3.0,\"res\":3.0},{\"groups\":\"b\",\"value\":NaN,\"res\":3.0},{\"groups\":\"b\",\"value\":1.0,\"res\":2.0},{\"groups\":\"a\",\"value\":7.0,\"res\":3.6666666666666665},{\"groups\":\"b\",\"value\":3.0,\"res\":2.0},{\"groups\":\"a\",\"value\":NaN,\"res\":4.5},{\"groups\":\"b\",\"value\":8.0,\"res\":4.0}]}"
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 43
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Date\n",
"Difficulty: easy/medium"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**31.** Create a column Of LocalDate that contains each day of 2015 and column of random numbers."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:39.407487Z",
"start_time": "2025-03-07T10:53:39.396034Z"
}
},
"source": [
"import kotlinx.datetime.*"
],
"outputs": [],
"execution_count": 44
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:39.595046Z",
"start_time": "2025-03-07T10:53:39.417471Z"
}
},
"source": [
"class DateRangeIterator(first: LocalDate, last: LocalDate, val step: Int) : Iterator {\n",
" private val finalElement: LocalDate = last\n",
" private var hasNext: Boolean = if (step > 0) first <= last else first >= last\n",
" private var next: LocalDate = if (hasNext) first else finalElement\n",
"\n",
" override fun hasNext(): Boolean = hasNext\n",
"\n",
" override fun next(): LocalDate {\n",
" val value = next\n",
" if (value == finalElement) {\n",
" if (!hasNext) throw kotlin.NoSuchElementException()\n",
" hasNext = false\n",
" } else {\n",
" next = next.plus(step, DateTimeUnit.DayBased(1))\n",
" }\n",
" return value\n",
" }\n",
"}\n",
"\n",
"operator fun ClosedRange.iterator() = DateRangeIterator(this.start, this.endInclusive, 1)\n",
"\n",
"fun ClosedRange.toList(): List {\n",
" return when (val size = this.start.daysUntil(this.endInclusive)) {\n",
" 0 -> emptyList()\n",
" 1 -> listOf(iterator().next())\n",
" else -> {\n",
" val dest = ArrayList(size)\n",
" for (item in this) {\n",
" dest.add(item)\n",
" }\n",
" dest\n",
" }\n",
" }\n",
"}"
],
"outputs": [],
"execution_count": 45
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:39.731960Z",
"start_time": "2025-03-07T10:53:39.622617Z"
}
},
"source": [
"val start = LocalDate(2015, 1, 1)\n",
"val end = LocalDate(2016, 1, 1)\n",
"\n",
"val days = (start..end).toList()\n",
"\n",
"val dti = days.toColumn(\"dti\")\n",
"val s = List(dti.size()) { Random.nextDouble() }.toColumn(\"s\")\n",
"val df = dataFrameOf(dti, s)\n",
"df.head()"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | dti | s |
|---|
| 2015-01-01 | 0,428694 |
| 2015-01-02 | 0,587807 |
| 2015-01-03 | 0,803538 |
| 2015-01-04 | 0,072165 |
| 2015-01-05 | 0,831086 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"dti\",\"s\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlinx.datetime.LocalDate\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"}],\"nrow\":5,\"ncol\":2},\"kotlin_dataframe\":[{\"dti\":\"2015-01-01\",\"s\":0.42869361469417255},{\"dti\":\"2015-01-02\",\"s\":0.5878066441159479},{\"dti\":\"2015-01-03\",\"s\":0.8035375057384638},{\"dti\":\"2015-01-04\",\"s\":0.07216474014378904},{\"dti\":\"2015-01-05\",\"s\":0.8310862424835981}]}"
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 46
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**32.** Find the sum of the values in s for every Wednesday."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:39.771738Z",
"start_time": "2025-03-07T10:53:39.734382Z"
}
},
"source": [
"df.filter { dti.dayOfWeek == DayOfWeek.TUESDAY }.sum { s }"
],
"outputs": [
{
"data": {
"text/plain": [
"25.711197390216796"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 47
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**33.** For each calendar month in s, find the mean of values."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:39.829455Z",
"start_time": "2025-03-07T10:53:39.794881Z"
}
},
"source": [
"df.groupBy { dti.map { it.month } named \"month\" }.mean()"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | month | s |
|---|
| JANUARY | 0,498863 |
| FEBRUARY | 0,523210 |
| MARCH | 0,522893 |
| APRIL | 0,565603 |
| MAY | 0,599494 |
| JUNE | 0,470282 |
| JULY | 0,514119 |
| AUGUST | 0,532443 |
| SEPTEMBER | 0,431539 |
| OCTOBER | 0,452930 |
| NOVEMBER | 0,417230 |
| DECEMBER | 0,536983 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"month\",\"s\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"java.time.Month\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"}],\"nrow\":12,\"ncol\":2},\"kotlin_dataframe\":[{\"month\":\"JANUARY\",\"s\":0.49886259661768795},{\"month\":\"FEBRUARY\",\"s\":0.5232101759296012},{\"month\":\"MARCH\",\"s\":0.5228930364548733},{\"month\":\"APRIL\",\"s\":0.5656027192380827},{\"month\":\"MAY\",\"s\":0.5994940445050382},{\"month\":\"JUNE\",\"s\":0.4702818837208372},{\"month\":\"JULY\",\"s\":0.514118942994226},{\"month\":\"AUGUST\",\"s\":0.5324426386910432},{\"month\":\"SEPTEMBER\",\"s\":0.43153905610005283},{\"month\":\"OCTOBER\",\"s\":0.4529295994603305},{\"month\":\"NOVEMBER\",\"s\":0.41722970069860005},{\"month\":\"DECEMBER\",\"s\":0.536982911337619}]}"
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 48
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**34.** For each group of four consecutive calendar months in s, find the date on which the highest value occurred."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:39.883384Z",
"start_time": "2025-03-07T10:53:39.832027Z"
}
},
"source": [
"df.add(\"month4\") {\n",
" when (dti.monthNumber) {\n",
" in 1..4 -> 1\n",
" in 5..8 -> 2\n",
" else -> 3\n",
" }\n",
"}.groupBy(\"month4\").aggregate { maxBy(s) into \"max\" }"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | month4 | max | | |
|---|
| dti | s | month4 |
|---|
| 1 | 2016-01-01 | 0,996645 | 1 |
| 2 | 2015-08-18 | 0,995806 | 2 |
| 3 | 2015-12-17 | 0,984362 | 3 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"month4\",\"max\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ColumnGroup\"}],\"nrow\":3,\"ncol\":2},\"kotlin_dataframe\":[{\"month4\":1,\"max\":{\"data\":{\"dti\":\"2016-01-01\",\"s\":0.9966451059523758,\"month4\":1},\"metadata\":{\"kind\":\"ColumnGroup\",\"columns\":[\"dti\",\"s\",\"month4\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlinx.datetime.LocalDate\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}]}}},{\"month4\":2,\"max\":{\"data\":{\"dti\":\"2015-08-18\",\"s\":0.9958061744880964,\"month4\":2},\"metadata\":{\"kind\":\"ColumnGroup\",\"columns\":[\"dti\",\"s\",\"month4\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlinx.datetime.LocalDate\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}]}}},{\"month4\":3,\"max\":{\"data\":{\"dti\":\"2015-12-17\",\"s\":0.9843617469965643,\"month4\":3},\"metadata\":{\"kind\":\"ColumnGroup\",\"columns\":[\"dti\",\"s\",\"month4\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlinx.datetime.LocalDate\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"}]}}}]}"
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 49
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**35.** Create a column consisting of the third Thursday in each month for the years 2015 and 2016."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:39.920689Z",
"start_time": "2025-03-07T10:53:39.906876Z"
}
},
"source": [
"import java.time.temporal.WeekFields\n",
"import java.util.*"
],
"outputs": [],
"execution_count": 50
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:39.991666Z",
"start_time": "2025-03-07T10:53:39.935515Z"
}
},
"source": [
"val start = LocalDate(2015, 1, 1)\n",
"val end = LocalDate(2016, 12, 31)\n",
"\n",
"(start..end).toList().toColumn(\"3thu\").filter {\n",
" it.toJavaLocalDate()[WeekFields.of(Locale.ENGLISH).weekOfMonth()] == 3\n",
" && it.dayOfWeek.value == 4\n",
"}"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | 3thu |
|---|
| 2015-01-15 |
| 2015-02-19 |
| 2015-03-19 |
| 2015-04-16 |
| 2015-05-14 |
| 2015-06-18 |
| 2015-07-16 |
| 2015-08-13 |
| 2015-09-17 |
| 2015-10-15 |
| 2015-11-19 |
| 2015-12-17 |
| 2016-01-14 |
| 2016-02-18 |
| 2016-03-17 |
| 2016-04-14 |
| 2016-05-19 |
| 2016-06-16 |
| 2016-07-14 |
| 2016-08-18 |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"3thu\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlinx.datetime.LocalDate\"}],\"nrow\":24,\"ncol\":1},\"kotlin_dataframe\":[{\"3thu\":\"2015-01-15\"},{\"3thu\":\"2015-02-19\"},{\"3thu\":\"2015-03-19\"},{\"3thu\":\"2015-04-16\"},{\"3thu\":\"2015-05-14\"},{\"3thu\":\"2015-06-18\"},{\"3thu\":\"2015-07-16\"},{\"3thu\":\"2015-08-13\"},{\"3thu\":\"2015-09-17\"},{\"3thu\":\"2015-10-15\"},{\"3thu\":\"2015-11-19\"},{\"3thu\":\"2015-12-17\"},{\"3thu\":\"2016-01-14\"},{\"3thu\":\"2016-02-18\"},{\"3thu\":\"2016-03-17\"},{\"3thu\":\"2016-04-14\"},{\"3thu\":\"2016-05-19\"},{\"3thu\":\"2016-06-16\"},{\"3thu\":\"2016-07-14\"},{\"3thu\":\"2016-08-18\"}]}"
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 51
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cleaning Data\n",
"### Making a DataFrame easier to work with\n",
"Difficulty: *easy/medium*\n",
"\n",
"It happens all the time: someone gives you data containing malformed strings, lists and missing data. How do you tidy it up so you can get on with the analysis?\n",
"\n",
"Take this monstrosity as the DataFrame to use in the following puzzles:\n",
"```kotlin\n",
"val fromTo = listOf(\"LoNDon_paris\", \"MAdrid_miLAN\", \"londON_StockhOlm\", \"Budapest_PaRis\", \"Brussels_londOn\").toColumn(\"From_To\")\n",
"val flightNumber = listOf(10045.0, Double.NaN, 10065.0, Double.NaN, 10085.0).toColumn(\"FlightNumber\")\n",
"val recentDelays = listOf(listOf(23, 47), listOf(), listOf(24, 43, 87), listOf(13), listOf(67, 32)).toColumn(\"RecentDelays\")\n",
"val airline = listOf(\"KLM(!)\", \" (12)\", \"(British Airways. )\", \"12. Air France\", \"'Swiss Air'\").toColumn(\"Airline\")\n",
"\n",
"val df = dataFrameOf(fromTo, flightNumber, recentDelays, airline)\n",
"```\n",
"\n",
"It looks like this:\n",
"```\n",
"From_To FlightNumber RecentDelays Airline\n",
"LoNDon_paris 10045.000000 [23, 47] KLM(!)\n",
"MAdrid_miLAN NaN [] {Air France} (12)\n",
"londON_StockhOlm 10065.000000 [24, 43, 87] (British Airways. )\n",
"Budapest_PaRis NaN [13] 12. Air France\n",
"Brussels_londOn 10085.000000 [67, 32] 'Swiss Air'\n",
"```"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:40.140992Z",
"start_time": "2025-03-07T10:53:40.019863Z"
}
},
"source": [
"val fromTo = listOf(\"LoNDon_paris\", \"MAdrid_miLAN\", \"londON_StockhOlm\", \"Budapest_PaRis\", \"Brussels_londOn\").toColumn(\"From_To\")\n",
"val flightNumber = listOf(10045.0, Double.NaN, 10065.0, Double.NaN, 10085.0).toColumn(\"FlightNumber\")\n",
"val recentDelays = listOf(listOf(23, 47), listOf(), listOf(24, 43, 87), listOf(13), listOf(67, 32)).toColumn(\"RecentDelays\")\n",
"val airline = listOf(\"KLM(!)\", \"{Air France} (12)\", \"(British Airways. )\", \"12. Air France\", \"'Swiss Air'\").toColumn(\"Airline\")\n",
"\n",
"var df = dataFrameOf(fromTo, flightNumber, recentDelays, airline)\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | From_To | FlightNumber | RecentDelays | Airline |
|---|
| LoNDon_paris | 10045,000000 | [23, 47] | KLM(!) |
| MAdrid_miLAN | NaN | [ ] | {Air France} (12) |
| londON_StockhOlm | 10065,000000 | [24, 43, 87] | (British Airways. ) |
| Budapest_PaRis | NaN | [13] | 12. Air France |
| Brussels_londOn | 10085,000000 | [67, 32] | 'Swiss Air' |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"From_To\",\"FlightNumber\",\"RecentDelays\",\"Airline\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Double\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.collections.List\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":5,\"ncol\":4},\"kotlin_dataframe\":[{\"From_To\":\"LoNDon_paris\",\"FlightNumber\":10045.0,\"RecentDelays\":[23,47],\"Airline\":\"KLM(!)\"},{\"From_To\":\"MAdrid_miLAN\",\"FlightNumber\":NaN,\"RecentDelays\":[],\"Airline\":\"{Air France} (12)\"},{\"From_To\":\"londON_StockhOlm\",\"FlightNumber\":10065.0,\"RecentDelays\":[24,43,87],\"Airline\":\"(British Airways. )\"},{\"From_To\":\"Budapest_PaRis\",\"FlightNumber\":NaN,\"RecentDelays\":[13],\"Airline\":\"12. Air France\"},{\"From_To\":\"Brussels_londOn\",\"FlightNumber\":10085.0,\"RecentDelays\":[67,32],\"Airline\":\"'Swiss Air'\"}]}"
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 52
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**36.** Some values in the FlightNumber column are missing (they are NaN). These numbers are meant to increase by 10 with each row, so 10055 and 10075 need to be put in place. Modify df to fill in these missing numbers and make the column an integer column (instead of a float column)."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:40.197286Z",
"start_time": "2025-03-07T10:53:40.145912Z"
}
},
"source": [
"df = df.fillNaNs { FlightNumber }\n",
" .with { prev()!!.FlightNumber + (next()!!.FlightNumber - prev()!!.FlightNumber) / 2 }\n",
" .convert { FlightNumber }.toInt()\n",
"df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | From_To | FlightNumber | RecentDelays | Airline |
|---|
| LoNDon_paris | 10045 | [23, 47] | KLM(!) |
| MAdrid_miLAN | 10055 | [ ] | {Air France} (12) |
| londON_StockhOlm | 10065 | [24, 43, 87] | (British Airways. ) |
| Budapest_PaRis | 10075 | [13] | 12. Air France |
| Brussels_londOn | 10085 | [67, 32] | 'Swiss Air' |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"From_To\",\"FlightNumber\",\"RecentDelays\",\"Airline\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.collections.List\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":5,\"ncol\":4},\"kotlin_dataframe\":[{\"From_To\":\"LoNDon_paris\",\"FlightNumber\":10045,\"RecentDelays\":[23,47],\"Airline\":\"KLM(!)\"},{\"From_To\":\"MAdrid_miLAN\",\"FlightNumber\":10055,\"RecentDelays\":[],\"Airline\":\"{Air France} (12)\"},{\"From_To\":\"londON_StockhOlm\",\"FlightNumber\":10065,\"RecentDelays\":[24,43,87],\"Airline\":\"(British Airways. )\"},{\"From_To\":\"Budapest_PaRis\",\"FlightNumber\":10075,\"RecentDelays\":[13],\"Airline\":\"12. Air France\"},{\"From_To\":\"Brussels_londOn\",\"FlightNumber\":10085,\"RecentDelays\":[67,32],\"Airline\":\"'Swiss Air'\"}]}"
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 53
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**37.** The **From_To** column would be better as two separate columns! Split each string on the underscore delimiter **_** to give a new two columns. Assign the correct names 'From' and 'To' to this columns."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:40.343480Z",
"start_time": "2025-03-07T10:53:40.251073Z"
}
},
"source": [
"var df2 = df.split { From_To }.by(\"_\").into(\"From\", \"To\")\n",
"df2"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | From | To | FlightNumber | RecentDelays | Airline |
|---|
| LoNDon | paris | 10045 | [23, 47] | KLM(!) |
| MAdrid | miLAN | 10055 | [ ] | {Air France} (12) |
| londON | StockhOlm | 10065 | [24, 43, 87] | (British Airways. ) |
| Budapest | PaRis | 10075 | [13] | 12. Air France |
| Brussels | londOn | 10085 | [67, 32] | 'Swiss Air' |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"From\",\"To\",\"FlightNumber\",\"RecentDelays\",\"Airline\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.collections.List\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":5,\"ncol\":5},\"kotlin_dataframe\":[{\"From\":\"LoNDon\",\"To\":\"paris\",\"FlightNumber\":10045,\"RecentDelays\":[23,47],\"Airline\":\"KLM(!)\"},{\"From\":\"MAdrid\",\"To\":\"miLAN\",\"FlightNumber\":10055,\"RecentDelays\":[],\"Airline\":\"{Air France} (12)\"},{\"From\":\"londON\",\"To\":\"StockhOlm\",\"FlightNumber\":10065,\"RecentDelays\":[24,43,87],\"Airline\":\"(British Airways. )\"},{\"From\":\"Budapest\",\"To\":\"PaRis\",\"FlightNumber\":10075,\"RecentDelays\":[13],\"Airline\":\"12. Air France\"},{\"From\":\"Brussels\",\"To\":\"londOn\",\"FlightNumber\":10085,\"RecentDelays\":[67,32],\"Airline\":\"'Swiss Air'\"}]}"
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 54
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**38.** Notice how the capitalisation of the city names is all mixed up in this temporary DataFrame 'temp'. Standardise the strings so that only the first letter is uppercase (e.g. \"londON\" should become \"London\".)"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:40.424920Z",
"start_time": "2025-03-07T10:53:40.348965Z"
}
},
"source": [
"df2 = df2.update { From and To }.with { it.lowercase().replaceFirstChar(Char::uppercase) }\n",
"df2"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | From | To | FlightNumber | RecentDelays | Airline |
|---|
| London | Paris | 10045 | [23, 47] | KLM(!) |
| Madrid | Milan | 10055 | [ ] | {Air France} (12) |
| London | Stockholm | 10065 | [24, 43, 87] | (British Airways. ) |
| Budapest | Paris | 10075 | [13] | 12. Air France |
| Brussels | London | 10085 | [67, 32] | 'Swiss Air' |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"From\",\"To\",\"FlightNumber\",\"RecentDelays\",\"Airline\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.collections.List\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":5,\"ncol\":5},\"kotlin_dataframe\":[{\"From\":\"London\",\"To\":\"Paris\",\"FlightNumber\":10045,\"RecentDelays\":[23,47],\"Airline\":\"KLM(!)\"},{\"From\":\"Madrid\",\"To\":\"Milan\",\"FlightNumber\":10055,\"RecentDelays\":[],\"Airline\":\"{Air France} (12)\"},{\"From\":\"London\",\"To\":\"Stockholm\",\"FlightNumber\":10065,\"RecentDelays\":[24,43,87],\"Airline\":\"(British Airways. )\"},{\"From\":\"Budapest\",\"To\":\"Paris\",\"FlightNumber\":10075,\"RecentDelays\":[13],\"Airline\":\"12. Air France\"},{\"From\":\"Brussels\",\"To\":\"London\",\"FlightNumber\":10085,\"RecentDelays\":[67,32],\"Airline\":\"'Swiss Air'\"}]}"
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 55
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**39.** In the **Airline** column, you can see some extra punctuation and symbols have appeared around the airline names. Pull out just the airline name. E.g. `'(British Airways. )'` should become `'British Airways'`."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:40.516100Z",
"start_time": "2025-03-07T10:53:40.469987Z"
}
},
"source": [
"df2 = df2.update { Airline }.with {\n",
" \"([a-zA-Z\\\\s]+)\".toRegex().find(it)?.value ?: \"\"\n",
"}\n",
"df2"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | From | To | FlightNumber | RecentDelays | Airline |
|---|
| London | Paris | 10045 | [23, 47] | KLM |
| Madrid | Milan | 10055 | [ ] | Air France |
| London | Stockholm | 10065 | [24, 43, 87] | British Airways |
| Budapest | Paris | 10075 | [13] | Air France |
| Brussels | London | 10085 | [67, 32] | Swiss Air |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"From\",\"To\",\"FlightNumber\",\"RecentDelays\",\"Airline\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.collections.List\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":5,\"ncol\":5},\"kotlin_dataframe\":[{\"From\":\"London\",\"To\":\"Paris\",\"FlightNumber\":10045,\"RecentDelays\":[23,47],\"Airline\":\"KLM\"},{\"From\":\"Madrid\",\"To\":\"Milan\",\"FlightNumber\":10055,\"RecentDelays\":[],\"Airline\":\"Air France\"},{\"From\":\"London\",\"To\":\"Stockholm\",\"FlightNumber\":10065,\"RecentDelays\":[24,43,87],\"Airline\":\"British Airways\"},{\"From\":\"Budapest\",\"To\":\"Paris\",\"FlightNumber\":10075,\"RecentDelays\":[13],\"Airline\":\" Air France\"},{\"From\":\"Brussels\",\"To\":\"London\",\"FlightNumber\":10085,\"RecentDelays\":[67,32],\"Airline\":\"Swiss Air\"}]}"
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 56
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**40.** In the **RecentDelays** column, the values have been entered into the DataFrame as a list. We would like each first value in its own column, each second value in its own column, and so on. If there isn't an Nth value, the value should be `null`.\n",
"\n",
"Expand the column of lists into columns named 'delays_' and replace the unwanted RecentDelays column in `df` with 'delays'."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-07T10:53:40.631283Z",
"start_time": "2025-03-07T10:53:40.521458Z"
}
},
"source": [
"val prep_df = df2.split { RecentDelays }.into { \"delay_$it\" }\n",
"prep_df"
],
"outputs": [
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" | From | To | FlightNumber | delay_1 | delay_2 | delay_3 | Airline |
|---|
| London | Paris | 10045 | 23 | 47 | null | KLM |
| Madrid | Milan | 10055 | null | null | null | Air France |
| London | Stockholm | 10065 | 24 | 43 | 87 | British Airways |
| Budapest | Paris | 10075 | 13 | null | null | Air France |
| Brussels | London | 10085 | 67 | 32 | null | Swiss Air |
\n",
" \n",
" \n",
" "
],
"application/kotlindataframe+json": "{\"$version\":\"2.1.1\",\"metadata\":{\"columns\":[\"From\",\"To\",\"FlightNumber\",\"delay_1\",\"delay_2\",\"delay_3\",\"Airline\"],\"types\":[{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int?\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int?\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.Int?\"},{\"kind\":\"ValueColumn\",\"type\":\"kotlin.String\"}],\"nrow\":5,\"ncol\":7},\"kotlin_dataframe\":[{\"From\":\"London\",\"To\":\"Paris\",\"FlightNumber\":10045,\"delay_1\":23,\"delay_2\":47,\"delay_3\":null,\"Airline\":\"KLM\"},{\"From\":\"Madrid\",\"To\":\"Milan\",\"FlightNumber\":10055,\"delay_1\":null,\"delay_2\":null,\"delay_3\":null,\"Airline\":\"Air France\"},{\"From\":\"London\",\"To\":\"Stockholm\",\"FlightNumber\":10065,\"delay_1\":24,\"delay_2\":43,\"delay_3\":87,\"Airline\":\"British Airways\"},{\"From\":\"Budapest\",\"To\":\"Paris\",\"FlightNumber\":10075,\"delay_1\":13,\"delay_2\":null,\"delay_3\":null,\"Airline\":\" Air France\"},{\"From\":\"Brussels\",\"To\":\"London\",\"FlightNumber\":10085,\"delay_1\":67,\"delay_2\":32,\"delay_3\":null,\"Airline\":\"Swiss Air\"}]}"
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 57
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The DataFrame looks much better now!\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Kotlin",
"language": "kotlin",
"name": "kotlin"
},
"language_info": {
"codemirror_mode": "text/x-kotlin",
"file_extension": ".kt",
"mimetype": "text/x-kotlin",
"name": "kotlin",
"nbconvert_exporter": "",
"pygments_lexer": "kotlin",
"version": "1.8.0-dev-707"
},
"ktnbPluginMetadata": {
"projectLibraries": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}