{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Kotlin for Jupyter Notebooks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/cheptsov/kotlin-jupyter-demo/master?filepath=index.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook will guide you on how one can use [Kotlin](https://kotlinlang.org/) with [Jupyter notebooks](https://jupyter.org/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installing kernel"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Currently, [Kotlin Jupyter kernel](https://github.com/erokhins/kotlin-jupyter) can be installed only via [conda](https://conda.io/en/latest/):\n",
"\n",
"```bash\n",
"conda install kotlin-jupyter-kernel -c jetbrains\n",
"```\n",
"\n",
"Later it will be also possible to install it via `pip install`.\n",
"\n",
"Note, Kotlin Jupyter requires Java 8 to be installed:\n",
"\n",
"```bash\n",
"apt-get install openjdk-8-jre\n",
"```\n",
"\n",
"Once these requirements are satisfied, feel free to run `jupyter notebook` and switch to `Kotlin` kernel."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Running cells"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's a simple example with Kotlin code:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"class Greeter(val name: String) {\n",
" fun greet() {\n",
" println(\"Hello, $name!\")\n",
" }\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello, Jupyter!\n"
]
}
],
"source": [
"Greeter(\"Jupyter\").greet() // Run me"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configuring Maven dependencies"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's another example, courtsey of [thomasnield/kotlin-statistics](https://github.com/thomasnield/kotlin-statistics), showcasing how to load additional dependencies to the notebook from Maven repos:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"@file:Repository(\"https://repo1.maven.org/maven2\")\n",
"@file:DependsOn(\"org.nield:kotlin-statistics:1.2.1\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import java.time.LocalDate\n",
"import java.time.temporal.ChronoUnit\n",
"import org.nield.kotlinstatistics.*\n",
"\n",
"data class Patient(val firstName: String,\n",
" val lastName: String,\n",
" val gender: Gender,\n",
" val birthday: LocalDate,\n",
" val whiteBloodCellCount: Int) {\n",
"\n",
" val age = ChronoUnit.YEARS.between(birthday, LocalDate.now())\n",
"}\n",
"\n",
"val patients = listOf(\n",
" Patient(\"John\", \"Simone\", Gender.MALE, LocalDate.of(1989, 1, 7), 4500),\n",
" Patient(\"Sarah\", \"Marley\", Gender.FEMALE, LocalDate.of(1970, 2, 5), 6700),\n",
" Patient(\"Jessica\", \"Arnold\", Gender.FEMALE, LocalDate.of(1980, 3, 9), 3400),\n",
" Patient(\"Sam\", \"Beasley\", Gender.MALE, LocalDate.of(1981, 4, 17), 8800),\n",
" Patient(\"Dan\", \"Forney\", Gender.MALE, LocalDate.of(1985, 9, 13), 5400),\n",
" Patient(\"Lauren\", \"Michaels\", Gender.FEMALE, LocalDate.of(1975, 8, 21), 5000),\n",
" Patient(\"Michael\", \"Erlich\", Gender.MALE, LocalDate.of(1985, 12, 17), 4100),\n",
" Patient(\"Jason\", \"Miles\", Gender.MALE, LocalDate.of(1991, 11, 1), 3900),\n",
" Patient(\"Rebekah\", \"Earley\", Gender.FEMALE, LocalDate.of(1985, 2, 18), 4600),\n",
" Patient(\"James\", \"Larson\", Gender.MALE, LocalDate.of(1974, 4, 10), 5100),\n",
" Patient(\"Dan\", \"Ulrech\", Gender.MALE, LocalDate.of(1991, 7, 11), 6000),\n",
" Patient(\"Heather\", \"Eisner\", Gender.FEMALE, LocalDate.of(1994, 3, 6), 6000),\n",
" Patient(\"Jasper\", \"Martin\", Gender.MALE, LocalDate.of(1971, 7, 1), 6000)\n",
")\n",
"\n",
"enum class Gender {\n",
" MALE,\n",
" FEMALE\n",
"}\n",
"\n",
"val clusters = patients.multiKMeansCluster(k = 3,\n",
" maxIterations = 10000,\n",
" trialCount = 50,\n",
" xSelector = { it.age.toDouble() },\n",
" ySelector = { it.whiteBloodCellCount.toDouble() }\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CENTROID: 0\n",
"\tPatient(firstName=Dan, lastName=Forney, gender=MALE, birthday=1985-09-13, whiteBloodCellCount=5400)\n",
"\tPatient(firstName=Lauren, lastName=Michaels, gender=FEMALE, birthday=1975-08-21, whiteBloodCellCount=5000)\n",
"\tPatient(firstName=James, lastName=Larson, gender=MALE, birthday=1974-04-10, whiteBloodCellCount=5100)\n",
"\tPatient(firstName=Dan, lastName=Ulrech, gender=MALE, birthday=1991-07-11, whiteBloodCellCount=6000)\n",
"\tPatient(firstName=Heather, lastName=Eisner, gender=FEMALE, birthday=1994-03-06, whiteBloodCellCount=6000)\n",
"\tPatient(firstName=Jasper, lastName=Martin, gender=MALE, birthday=1971-07-01, whiteBloodCellCount=6000)\n",
"CENTROID: 1\n",
"\tPatient(firstName=John, lastName=Simone, gender=MALE, birthday=1989-01-07, whiteBloodCellCount=4500)\n",
"\tPatient(firstName=Jessica, lastName=Arnold, gender=FEMALE, birthday=1980-03-09, whiteBloodCellCount=3400)\n",
"\tPatient(firstName=Michael, lastName=Erlich, gender=MALE, birthday=1985-12-17, whiteBloodCellCount=4100)\n",
"\tPatient(firstName=Jason, lastName=Miles, gender=MALE, birthday=1991-11-01, whiteBloodCellCount=3900)\n",
"\tPatient(firstName=Rebekah, lastName=Earley, gender=FEMALE, birthday=1985-02-18, whiteBloodCellCount=4600)\n",
"CENTROID: 2\n",
"\tPatient(firstName=Sarah, lastName=Marley, gender=FEMALE, birthday=1970-02-05, whiteBloodCellCount=6700)\n",
"\tPatient(firstName=Sam, lastName=Beasley, gender=MALE, birthday=1981-04-17, whiteBloodCellCount=8800)\n"
]
}
],
"source": [
"clusters.forEachIndexed { index, item ->\n",
" println(\"CENTROID: $index\")\n",
" item.points.forEach {\n",
" println(\"\\t$it\")\n",
" }\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configuring the built-in via magics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For a more straightforward, the Kotlin kernel pre-configures certain libraries, and allows the notebook user to load them via special commands, also known as [magics](https://ipython.readthedocs.io/en/stable/interactive/magics.html). To pre-configure libraries for a notebook, one must comma-separate their names prepened with `%use`. Here's how it works:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"%use kotlin-statistics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When such a cell is executed, the kernel, makes sure the corresponding Maven repo is configured, the library is loaded, necessary import statements are added (e.g. in that case `import org.nield.kotlinstatistics.*` won't be needed), and necessary renderers are configured. The supported magics now include: [`%%kotlin-statistics`](https://github.com/thomasnield/kotlin-statistics), [`klaxon`](https://github.com/cbeust/klaxon), [`krangl`](https://github.com/holgerbrandl/krangl), [`kravis`](https://github.com/holgerbrandl/kravis), and [`lets-plot`](https://github.com/jetbrains/datalore-plot)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's another example, showcasing [`krangl`](https://github.com/holgerbrandl/krangl), and [`lets-plot`](https://github.com/jetbrains/datalore-plot) libraries:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
" "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%use lets-plot, krangl"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": []
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/html": [
"sepal_length | sepal_width | petal_length | petal_width | species |
---|
5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"val df = DataFrame.readCSV(\"data/iris.csv\")\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": []
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/html": [
"species | n |
---|
Iris-setosa | 50 |
Iris-versicolor | 50 |
Iris-virginica | 50 |
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.groupBy(\"species\").count()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": []
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"val points = geom_point(\n",
" data = mapOf(\n",
" \"x\" to df[\"sepal_length\"].asDoubles().toList(),\n",
" \"y\" to df[\"sepal_width\"].asDoubles().toList(),\n",
" \"color\" to df[\"species\"].asStrings().toList()\n",
" \n",
" ), alpha=1.0)\n",
"{\n",
" x = \"x\" \n",
" y = \"y\"\n",
" color = \"color\"\n",
"}\n",
"\n",
"ggplot() + points"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Useful libraries"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* [kotlin-statistics](https://github.com/thomasnield/kotlin-statistics) is a library that provides a set of extension functions to perform exploratory and production statistics. It supports basic numeric list/sequence/array functions (from `sum` to `skewness`), slicing operators (e.g. `countBy`, `simpleRegressionBy`, etc), binning operations, discrete PDF sampling, naive bayes classifier, clustering, linear regression, and more.\n",
"* [kmath](https://github.com/mipt-npm/kmath) is a library inspired by `numpy`; this library supports algebraic structures and operations, array-like structures, math expressions, histograms, streaming operations, wrappers around [commons-math](http://commons.apache.org/proper/commons-math/) and [koma](https://github.com/kyonifer/koma), and more.\n",
"* [krangl](https://github.com/holgerbrandl/krangl) is a library inspired by R's `dplyr` and Python's `pandas`; this library provides functionality for data manipulation using a functional-style API; it allows to filter, transform, aggregate and reshape tabular data.\n",
"* [lets-plot](https://github.com/JetBrains/lets-plot) is a library for declaratively creating plots based tabular data; it is inspired by Python's `ggplot` and [The Grammar of Graphics](https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448/); this library is integrated tightly with the Kotlin kernel; the library is multi-platform and can be used not just with JVM but also from JS and Python.\n",
"* [kravis](https://github.com/holgerbrandl/kravis) is another library inspired by Python's `ggplot` for visualization of tabular data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Documentation and contribution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The kernel's source code along with documentation is available on [GitHub](https://github.com/erokhins/kotlin-jupyter)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The community has already started adopting Kotlin for data science, and this adoption is only growing. It’s very much recommended to watch a [talk](https://www.youtube.com/watch?v=yjVW6uCmVBA) by Holger Brandl (the creator of [krangl](https://github.com/holgerbrandl/krangl), a Kotlin’s analog of Python’s pandas) or another [talk](https://www.youtube.com/watch?v=-zTqtEcnM7A&feature=youtu.be) by Thomas Nield (the creator of [kotlin-statistics](https://github.com/thomasnield/kotlin-statistics)), or read his [article](https://towardsdatascience.com/introduction-to-kotlin-statistics-cdad3be88b5)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Kotlin",
"language": "kotlin",
"name": "kotlin"
},
"language_info": {
"codemirror_mode": "text/x-kotlin",
"file_extension": "kt",
"name": "kotlin"
}
},
"nbformat": 4,
"nbformat_minor": 2
}