# Quickstart

To start working with Kotlin DataFrame in a notebook, run the cell with the next code:

In [9]:
%useLatestDescriptors
%use dataframe

This will load all necessary DataFrame dependencies (of the latest stable version) and all imports, as well as DataFrame rendering. Learn more [here](https://kotlin.github.io/dataframe/gettingstartedkotlinnotebook.html#integrate-kotlin-dataframe).

## Read DataFrame

Kotlin DataFrame supports all popular data formats, including CSV, JSON and Excel, as well as reading from various databases. Read a CSV with the "Jetbrains Repositories" dataset into `df` variable:

In [10]:
val df = DataFrame.readCsv(
    "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
)

## Display And Explore

To display your dataframe as a cell output, place it in the last line of the cell:

In [11]:
df

full_name,html_url,stargazers_count,topics,watchers
JetBrains/JPS,https://github.com/JetBrains/JPS,23,[],23
JetBrains/YouTrackSharp,https://github.com/JetBrains/YouTrack...,115,"[jetbrains, jetbrains-youtrack, youtr...",115
JetBrains/colorSchemeTool,https://github.com/JetBrains/colorSch...,290,[],290
JetBrains/ideavim,https://github.com/JetBrains/ideavim,6120,"[ideavim, intellij, intellij-platform...",6120
JetBrains/youtrack-vcs-hooks,https://github.com/JetBrains/youtrack...,5,[],5
JetBrains/youtrack-rest-ruby-library,https://github.com/JetBrains/youtrack...,8,[],8
JetBrains/emacs4ij,https://github.com/JetBrains/emacs4ij,47,[],47
JetBrains/codereview4intellij,https://github.com/JetBrains/coderevi...,11,[],11
JetBrains/teamcity-nuget-support,https://github.com/JetBrains/teamcity...,41,"[nuget, nuget-feed, teamcity, teamcit...",41
JetBrains/Grammar-Kit,https://github.com/JetBrains/Grammar-Kit,534,[],534


Kotlin Notebook has special interactive outputs for `DataFrame`. Learn more about them [here](https://kotlin.github.io/dataframe/kotlin-dataframe-features-in-kotlin-notebook.html).

Use `.describe()` method to get dataset summaries â€” column types, number of nulls and simple statistics.

In [12]:
df.describe()

name,type,count,unique,nulls,top,freq,mean,std,min,p25,median,p75,max
full_name,String,562,562,0,JetBrains/JPS,1,,,JetBrains/Android-Tuts-Samples,JetBrains/eslint-config,JetBrains/lightbeam,JetBrains/teamcity-bitbucket-issues,JetBrains/ztools
html_url,URL,562,562,0,https://github.com/JetBrains/JPS,1,,,,,,,
stargazers_count,Int,562,165,0,1,100,244.759786,1862.801982,0,2.000000,8.000000,48.000000,39402
topics,String,562,145,0,[],401,,,"[2d, graphics, java, skia]",[],[],"[awt, swing]","[youtrack, youtrack-workflow]"
watchers,Int,562,165,0,1,100,244.759786,1862.801982,0,2.000000,8.000000,48.000000,39402


## Select Columns

Kotlin DataFrame features a typesafe [Columns Selection DSL](https://kotlin.github.io/dataframe/columnselectors.html), enabling flexible and safe selection of any combination of columns.
Column selectors are widely used across operations â€” one of the simplest examples is `.select { }`, which returns a new DataFrame with only the columns chosen in [Columns Selection](https://kotlin.github.io/dataframe/columnselectors.html) expression.

After executing the cell where a `DataFrame` variable is declared, an extension with properties for its columns is automatically generated.
These properties can then be used in the [Columns Selection DSL](https://kotlin.github.io/dataframe/columnselectors.html) expression for typesafe and convenient column access.

Select some columns:

In [13]:
// Select "full_name", "stargazers_count" and "topics" columns
val dfSelected = df.select { full_name and stargazers_count and topics }
dfSelected

full_name,stargazers_count,topics
JetBrains/JPS,23,[]
JetBrains/YouTrackSharp,115,"[jetbrains, jetbrains-youtrack, youtr..."
JetBrains/colorSchemeTool,290,[]
JetBrains/ideavim,6120,"[ideavim, intellij, intellij-platform..."
JetBrains/youtrack-vcs-hooks,5,[]
JetBrains/youtrack-rest-ruby-library,8,[]
JetBrains/emacs4ij,47,[]
JetBrains/codereview4intellij,11,[]
JetBrains/teamcity-nuget-support,41,"[nuget, nuget-feed, teamcity, teamcit..."
JetBrains/Grammar-Kit,534,[]


## Raw Filtering

Some operations use `RowExpression`, i.e., an expression that applies for all `DataFrame` rows.
For example `.filter { }` returns a new `DataFrame` with rows that satisfy a condition given by row expression.

Inside a row expression, you can access the values of the current row by column names through auto-generated properties.
Similar to the [Columns Selection DSL](https://kotlin.github.io/dataframe/columnselectors.html), but in this case the properties represent actual values, not column references.

Filter rows by "stargazers_count" value:

In [14]:
// Keep only rows where "stargazers_count" value is more than 1000
val dfFiltered = dfSelected.filter { stargazers_count >= 1000 }
dfFiltered

full_name,stargazers_count,topics
JetBrains/ideavim,6120,"[ideavim, intellij, intellij-platform..."
JetBrains/MPS,1241,"[domain-specific-language, dsl]"
JetBrains/intellij-community,12926,"[code-editor, ide, intellij, intellij..."
JetBrains/intellij-scala,1066,"[intellij-idea, intellij-plugin, scala]"
JetBrains/kotlin,39402,"[compiler, gradle-plugin, intellij-pl..."
JetBrains/intellij-plugins,1737,[]
JetBrains/Exposed,5688,"[dao, kotlin, orm, sql]"
JetBrains/kotlin-web-site,1074,[kotlin]
JetBrains/idea-gitignore,1181,"[gitignore, ignore-files, intellij, i..."
JetBrains/swot,1072,[]


## Columns Rename

Columns can be renamed using the `.rename { }` operation, which also uses the [Columns Selection DSL](https://kotlin.github.io/dataframe/columnselectors.html) to select a column to rename.
The `rename` operation does not perform the renaming immediately; instead, it creates an intermediate object that must be finalized into a new `DataFrame` by calling the `.into()` function with the new column name.

Rename "full_name" and "stargazers_count" columns:

In [15]:
// Rename "full_name" column into "name"
val dfRenamed = dfFiltered
    .rename { full_name }.into("name")
    // And "stargazers_count" into "starsCount"
    .rename { stargazers_count }.into("starsCount")
dfRenamed

name,starsCount,topics
JetBrains/ideavim,6120,"[ideavim, intellij, intellij-platform..."
JetBrains/MPS,1241,"[domain-specific-language, dsl]"
JetBrains/intellij-community,12926,"[code-editor, ide, intellij, intellij..."
JetBrains/intellij-scala,1066,"[intellij-idea, intellij-plugin, scala]"
JetBrains/kotlin,39402,"[compiler, gradle-plugin, intellij-pl..."
JetBrains/intellij-plugins,1737,[]
JetBrains/Exposed,5688,"[dao, kotlin, orm, sql]"
JetBrains/kotlin-web-site,1074,[kotlin]
JetBrains/idea-gitignore,1181,"[gitignore, ignore-files, intellij, i..."
JetBrains/swot,1072,[]


## Modify Columns

Columns can be modified using the `update { }` and `convert { }` operations.
Both operations select columns to modify via the [Columns Selection DSL](https://kotlin.github.io/dataframe/columnselectors.html) and, similar to `rename`, create an intermediate object that must be finalized to produce a new `DataFrame`.

The `update` operation preserves the original column types, while `convert` allows changing the type.
In both cases, column names and their positions remain unchanged.

Update "name" and convert "topics":

In [26]:
val dfUpdated = dfRenamed
    // Update "name" values with only its second part (after '/')
    .update { name }.with { it.split("/")[1] }
    // Convert "topics" `String` values into `List<String>` by splitting:
    .convert { topics }.with { it.removeSurrounding("[", "]").split(", ") }
dfUpdated

name,starsCount,topics
ideavim,6120,"[ideavim, intellij, intellij-platform..."
MPS,1241,"[domain-specific-language, dsl]"
intellij-community,12926,"[code-editor, ide, intellij, intellij..."
intellij-scala,1066,"[intellij-idea, intellij-plugin, scala]"
kotlin,39402,"[compiler, gradle-plugin, intellij-pl..."
intellij-plugins,1737,[]
Exposed,5688,"[dao, kotlin, orm, sql]"
kotlin-web-site,1074,[kotlin]
idea-gitignore,1181,"[gitignore, ignore-files, intellij, i..."
swot,1072,[]


Check the new "topics" type out:

In [27]:
dfUpdated.topics.type()

kotlin.collections.List<kotlin.String>

## Adding New Columns

The `.add { }` function allows creating a `DataFrame` with a new column, where the value for each row is computed based on the existing values in that row. These values can be accessed within the row expressions.

Add a new `Boolean` column "isIntellij":

In [28]:
// Add a `Boolean` column indicating whether the `name` contains the "intellij" substring
// or the topics include "intellij".
val dfWithIsIntellij = dfUpdated.add("isIntellij") {
    name.contains("intellij") || "intellij" in topics
}
dfWithIsIntellij

name,starsCount,topics,isIntellij
ideavim,6120,"[ideavim, intellij, intellij-platform...",True
MPS,1241,"[domain-specific-language, dsl]",False
intellij-community,12926,"[code-editor, ide, intellij, intellij...",True
intellij-scala,1066,"[intellij-idea, intellij-plugin, scala]",True
kotlin,39402,"[compiler, gradle-plugin, intellij-pl...",False
intellij-plugins,1737,[],True
Exposed,5688,"[dao, kotlin, orm, sql]",False
kotlin-web-site,1074,[kotlin],False
idea-gitignore,1181,"[gitignore, ignore-files, intellij, i...",True
swot,1072,[],False


## Grouping And Aggregating

A `DataFrame` can be grouped by column keys, meaning its rows are split into groups based on the values in the key columns.
The `.groupBy { }` operation selects columns and groups the `DataFrame` by their values, using them as grouping keys.

The result is a `GroupBy` â€” a `DataFrame`-like structure that associates each key with the corresponding subset of the original `DataFrame`.

Group `dfWithIsIntellij` by "isIntellij":

In [19]:
val groupedByIsIntellij = dfWithIsIntellij.groupBy { isIntellij }
groupedByIsIntellij

isIntellij,group,Unnamed: 2_level_0,Unnamed: 3_level_0
name,starsCount,topics,isIntellij
name,starsCount,topics,isIntellij
true,"DataFrame [7 x 4]namestarsCounttopicsisIntellijideavim6120[ideavim, intellij, intellij-platform...trueintellij-community12926[code-editor, ide, intellij, intellij...trueintellij-scala1066[intellij-idea, intellij-plugin, scala]trueintellij-plugins1737[]trueidea-gitignore1181[gitignore, ignore-files, intellij, i...true... showing only top 5 of 7 rows",,
name,starsCount,topics,isIntellij
ideavim,6120,"[ideavim, intellij, intellij-platform...",true
intellij-community,12926,"[code-editor, ide, intellij, intellij...",true
intellij-scala,1066,"[intellij-idea, intellij-plugin, scala]",true
intellij-plugins,1737,[],true
idea-gitignore,1181,"[gitignore, ignore-files, intellij, i...",true
false,"DataFrame [17 x 4]namestarsCounttopicsisIntellijMPS1241[domain-specific-language, dsl]falsekotlin39402[compiler, gradle-plugin, intellij-pl...falseExposed5688[dao, kotlin, orm, sql]falsekotlin-web-site1074[kotlin]falseswot1072[]false... showing only top 5 of 17 rows",,
name,starsCount,topics,isIntellij
MPS,1241,"[domain-specific-language, dsl]",false

name,starsCount,topics,isIntellij
ideavim,6120,"[ideavim, intellij, intellij-platform...",True
intellij-community,12926,"[code-editor, ide, intellij, intellij...",True
intellij-scala,1066,"[intellij-idea, intellij-plugin, scala]",True
intellij-plugins,1737,[],True
idea-gitignore,1181,"[gitignore, ignore-files, intellij, i...",True

name,starsCount,topics,isIntellij
MPS,1241,"[domain-specific-language, dsl]",False
kotlin,39402,"[compiler, gradle-plugin, intellij-pl...",False
Exposed,5688,"[dao, kotlin, orm, sql]",False
kotlin-web-site,1074,[kotlin],False
swot,1072,[],False


A `GroupBy` can be aggregated â€” that is, you can compute one or several summary statistics for each group.
The result of the aggregation is a `DataFrame` containing the key columns along with new columns holding the computed statistics for a corresponding group.

For example, `count()` computes size of a group:

In [20]:
groupedByIsIntellij.count()

isIntellij,count
True,7
False,17


Compute several statistics with `.aggregate { }`, which provides a DSL for aggregating:

In [21]:
groupedByIsIntellij.aggregate {
    // Compute sum and max of "starsCount" within each group into "sumStars" and "maxStars" columns
    sumOf { starsCount } into "sumStars"
    maxOf { starsCount } into "maxStars"
}

isIntellij,sumStars,maxStars
True,25221,12926
False,85392,39402


## Sorting Rows

`.sort {}`/`.sortByDesc` sorts rows by value in selected columns, returning a DataFrame.

`take(n)` returns a new `DataFrame` with the first `n` rows.

Combine them to get Top-10 repositories by number of stars:

In [22]:
val dfTop10 = dfWithIsIntellij
    // Sort by "starsCount" value descending
    .sortByDesc { starsCount }
    .take(10)
dfTop10

name,starsCount,topics,isIntellij
kotlin,39402,"[compiler, gradle-plugin, intellij-pl...",False
intellij-community,12926,"[code-editor, ide, intellij, intellij...",True
kotlin-native,7101,"[c, compiler, kotlin, llvm, objective-c]",False
compose-jb,6805,"[android, awt, compose, declarative-u...",False
ideavim,6120,"[ideavim, intellij, intellij-platform...",True
JetBrainsMono,6059,"[coding-font, font, ligatures, monosp...",False
Exposed,5688,"[dao, kotlin, orm, sql]",False
ring-ui,2836,"[components, jetbrains-ui, react]",False
kotlinconf-app,2628,[],False
create-react-kotlin-app,2424,"[create-react-app, jetbrains-ui, kotl...",False


## Plotting With Kandy

Kandy is a Kotlin plotting library designed to bring Kotlin DataFrame features into chart creation, providing a convenient and typesafe way to build data visualizations.

Kandy can be loaded into notebook using `%use kandy`:

In [23]:
%use kandy

Build a simple bar chart with the `.plot { }` extension for DataFrame, that allows to use extension properties inside Kandy plotting DSL (a plot will be rendered as output after cell execution):

In [24]:
dfTop10.plot {
    bars {
        x(name)
        y(starsCount)
    }

    layout.title = "Top 10 JetBrains repositories by stars count"
}

## Write DataFrame

`DataFrame` supports writing to (almost) all formats that it is capable of reading.

Write to Excel:

In [25]:
dfWithIsIntellij.writeExcel("jb_repos.xlsx")

2025-05-27T17:21:11.899521Z Execution of code 'dfWithIsIntellij.wri...' ERROR Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
