{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Kotlin for Jupyter Notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/cheptsov/kotlin-jupyter-demo/master?filepath=index.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook will guide you on how one can use [Kotlin](https://kotlinlang.org/) with [Jupyter notebooks](https://jupyter.org/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Installing kernel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently, [Kotlin Jupyter kernel](https://github.com/erokhins/kotlin-jupyter) can be installed only via [conda](https://conda.io/en/latest/):\n", "\n", "```bash\n", "conda install kotlin-jupyter-kernel -c jetbrains\n", "```\n", "\n", "Later it will be also possible to install it via `pip install`.\n", "\n", "Note, Kotlin Jupyter requires Java 8 to be installed:\n", "\n", "```bash\n", "apt-get install openjdk-8-jre\n", "```\n", "\n", "Once these requirements are satisfied, feel free to run `jupyter notebook` and switch to `Kotlin` kernel." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running cells" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a simple example with Kotlin code:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "class Greeter(val name: String) {\n", " fun greet() {\n", " println(\"Hello, $name!\")\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello, Jupyter!\n" ] } ], "source": [ "Greeter(\"Jupyter\").greet() // Run me" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configuring Maven dependencies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's another example, courtsey of [thomasnield/kotlin-statistics](https://github.com/thomasnield/kotlin-statistics), showcasing how to load additional dependencies to the notebook from Maven repos:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "@file:Repository(\"https://repo1.maven.org/maven2\")\n", "@file:DependsOn(\"org.nield:kotlin-statistics:1.2.1\")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import java.time.LocalDate\n", "import java.time.temporal.ChronoUnit\n", "import org.nield.kotlinstatistics.*\n", "\n", "data class Patient(val firstName: String,\n", " val lastName: String,\n", " val gender: Gender,\n", " val birthday: LocalDate,\n", " val whiteBloodCellCount: Int) {\n", "\n", " val age = ChronoUnit.YEARS.between(birthday, LocalDate.now())\n", "}\n", "\n", "val patients = listOf(\n", " Patient(\"John\", \"Simone\", Gender.MALE, LocalDate.of(1989, 1, 7), 4500),\n", " Patient(\"Sarah\", \"Marley\", Gender.FEMALE, LocalDate.of(1970, 2, 5), 6700),\n", " Patient(\"Jessica\", \"Arnold\", Gender.FEMALE, LocalDate.of(1980, 3, 9), 3400),\n", " Patient(\"Sam\", \"Beasley\", Gender.MALE, LocalDate.of(1981, 4, 17), 8800),\n", " Patient(\"Dan\", \"Forney\", Gender.MALE, LocalDate.of(1985, 9, 13), 5400),\n", " Patient(\"Lauren\", \"Michaels\", Gender.FEMALE, LocalDate.of(1975, 8, 21), 5000),\n", " Patient(\"Michael\", \"Erlich\", Gender.MALE, LocalDate.of(1985, 12, 17), 4100),\n", " Patient(\"Jason\", \"Miles\", Gender.MALE, LocalDate.of(1991, 11, 1), 3900),\n", " Patient(\"Rebekah\", \"Earley\", Gender.FEMALE, LocalDate.of(1985, 2, 18), 4600),\n", " Patient(\"James\", \"Larson\", Gender.MALE, LocalDate.of(1974, 4, 10), 5100),\n", " Patient(\"Dan\", \"Ulrech\", Gender.MALE, LocalDate.of(1991, 7, 11), 6000),\n", " Patient(\"Heather\", \"Eisner\", Gender.FEMALE, LocalDate.of(1994, 3, 6), 6000),\n", " Patient(\"Jasper\", \"Martin\", Gender.MALE, LocalDate.of(1971, 7, 1), 6000)\n", ")\n", "\n", "enum class Gender {\n", " MALE,\n", " FEMALE\n", "}\n", "\n", "val clusters = patients.multiKMeansCluster(k = 3,\n", " maxIterations = 10000,\n", " trialCount = 50,\n", " xSelector = { it.age.toDouble() },\n", " ySelector = { it.whiteBloodCellCount.toDouble() }\n", ")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CENTROID: 0\n", "\tPatient(firstName=Dan, lastName=Forney, gender=MALE, birthday=1985-09-13, whiteBloodCellCount=5400)\n", "\tPatient(firstName=Lauren, lastName=Michaels, gender=FEMALE, birthday=1975-08-21, whiteBloodCellCount=5000)\n", "\tPatient(firstName=James, lastName=Larson, gender=MALE, birthday=1974-04-10, whiteBloodCellCount=5100)\n", "\tPatient(firstName=Dan, lastName=Ulrech, gender=MALE, birthday=1991-07-11, whiteBloodCellCount=6000)\n", "\tPatient(firstName=Heather, lastName=Eisner, gender=FEMALE, birthday=1994-03-06, whiteBloodCellCount=6000)\n", "\tPatient(firstName=Jasper, lastName=Martin, gender=MALE, birthday=1971-07-01, whiteBloodCellCount=6000)\n", "CENTROID: 1\n", "\tPatient(firstName=John, lastName=Simone, gender=MALE, birthday=1989-01-07, whiteBloodCellCount=4500)\n", "\tPatient(firstName=Jessica, lastName=Arnold, gender=FEMALE, birthday=1980-03-09, whiteBloodCellCount=3400)\n", "\tPatient(firstName=Michael, lastName=Erlich, gender=MALE, birthday=1985-12-17, whiteBloodCellCount=4100)\n", "\tPatient(firstName=Jason, lastName=Miles, gender=MALE, birthday=1991-11-01, whiteBloodCellCount=3900)\n", "\tPatient(firstName=Rebekah, lastName=Earley, gender=FEMALE, birthday=1985-02-18, whiteBloodCellCount=4600)\n", "CENTROID: 2\n", "\tPatient(firstName=Sarah, lastName=Marley, gender=FEMALE, birthday=1970-02-05, whiteBloodCellCount=6700)\n", "\tPatient(firstName=Sam, lastName=Beasley, gender=MALE, birthday=1981-04-17, whiteBloodCellCount=8800)\n" ] } ], "source": [ "clusters.forEachIndexed { index, item ->\n", " println(\"CENTROID: $index\")\n", " item.points.forEach {\n", " println(\"\\t$it\")\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configuring the built-in via magics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a more straightforward, the Kotlin kernel pre-configures certain libraries, and allows the notebook user to load them via special commands, also known as [magics](https://ipython.readthedocs.io/en/stable/interactive/magics.html). To pre-configure libraries for a notebook, one must comma-separate their names prepened with `%use`. Here's how it works:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "%use kotlin-statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When such a cell is executed, the kernel, makes sure the corresponding Maven repo is configured, the library is loaded, necessary import statements are added (e.g. in that case `import org.nield.kotlinstatistics.*` won't be needed), and necessary renderers are configured. The supported magics now include: [`%%kotlin-statistics`](https://github.com/thomasnield/kotlin-statistics), [`klaxon`](https://github.com/cbeust/klaxon), [`krangl`](https://github.com/holgerbrandl/krangl), [`kravis`](https://github.com/holgerbrandl/kravis), and [`lets-plot`](https://github.com/jetbrains/datalore-plot)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's another example, showcasing [`krangl`](https://github.com/holgerbrandl/krangl), and [`lets-plot`](https://github.com/jetbrains/datalore-plot) libraries:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%use lets-plot, krangl" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
sepal_lengthsepal_widthpetal_lengthpetal_widthspecies
5.13.51.40.2Iris-setosa
4.93.01.40.2Iris-setosa
4.73.21.30.2Iris-setosa
4.63.11.50.2Iris-setosa
5.03.61.40.2Iris-setosa
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "val df = DataFrame.readCSV(\"data/iris.csv\")\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
speciesn
Iris-setosa50
Iris-versicolor50
Iris-virginica50
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df.groupBy(\"species\").count()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "\n", "
\n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "val points = geom_point(\n", " data = mapOf(\n", " \"x\" to df[\"sepal_length\"].asDoubles().toList(),\n", " \"y\" to df[\"sepal_width\"].asDoubles().toList(),\n", " \"color\" to df[\"species\"].asStrings().toList()\n", " \n", " ), alpha=1.0)\n", "{\n", " x = \"x\" \n", " y = \"y\"\n", " color = \"color\"\n", "}\n", "\n", "ggplot() + points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Useful libraries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* [kotlin-statistics](https://github.com/thomasnield/kotlin-statistics) is a library that provides a set of extension functions to perform exploratory and production statistics. It supports basic numeric list/sequence/array functions (from `sum` to `skewness`), slicing operators (e.g. `countBy`, `simpleRegressionBy`, etc), binning operations, discrete PDF sampling, naive bayes classifier, clustering, linear regression, and more.\n", "* [kmath](https://github.com/mipt-npm/kmath) is a library inspired by `numpy`; this library supports algebraic structures and operations, array-like structures, math expressions, histograms, streaming operations, wrappers around [commons-math](http://commons.apache.org/proper/commons-math/) and [koma](https://github.com/kyonifer/koma), and more.\n", "* [krangl](https://github.com/holgerbrandl/krangl) is a library inspired by R's `dplyr` and Python's `pandas`; this library provides functionality for data manipulation using a functional-style API; it allows to filter, transform, aggregate and reshape tabular data.\n", "* [lets-plot](https://github.com/JetBrains/lets-plot) is a library for declaratively creating plots based tabular data; it is inspired by Python's `ggplot` and [The Grammar of Graphics](https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448/); this library is integrated tightly with the Kotlin kernel; the library is multi-platform and can be used not just with JVM but also from JS and Python.\n", "* [kravis](https://github.com/holgerbrandl/kravis) is another library inspired by Python's `ggplot` for visualization of tabular data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Documentation and contribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The kernel's source code along with documentation is available on [GitHub](https://github.com/erokhins/kotlin-jupyter)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The community has already started adopting Kotlin for data science, and this adoption is only growing. It’s very much recommended to watch a [talk](https://www.youtube.com/watch?v=yjVW6uCmVBA) by Holger Brandl (the creator of [krangl](https://github.com/holgerbrandl/krangl), a Kotlin’s analog of Python’s pandas) or another [talk](https://www.youtube.com/watch?v=-zTqtEcnM7A&feature=youtu.be) by Thomas Nield (the creator of [kotlin-statistics](https://github.com/thomasnield/kotlin-statistics)), or read his [article](https://towardsdatascience.com/introduction-to-kotlin-statistics-cdad3be88b5)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Kotlin", "language": "kotlin", "name": "kotlin" }, "language_info": { "codemirror_mode": "text/x-kotlin", "file_extension": "kt", "name": "kotlin" } }, "nbformat": 4, "nbformat_minor": 2 }