{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This page is available as an executable or viewable Jupyter Notebook:\n", "

\n", "\n", " \n", "\n", "\n", " \n", "\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stratified sampling\n", "\n", "In large dataset a relatively small group of points might be overplotted by the dominant group. In this case __stratified sampling__ can help." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%useLatestDescriptors\n", "%use lets-plot" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import java.util.Random\n", "\n", "val N = 5000 \n", "val smallGroup = 3\n", "val largeGroup = N - smallGroup\n", "\n", "val rand = Random(123)\n", "val data = mapOf (\n", " \"x\" to List(N) { rand.nextGaussian() },\n", " \"y\" to List(N) { rand.nextGaussian() },\n", " \"cond\" to List(smallGroup) { 'A' } + List(largeGroup) { 'B' }\n", ")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 3, "metadata": { "new_classpath": [] }, "output_type": "execute_result" } ], "source": [ "// Data points in group 'A' (small group) are overplotted by the dominant group 'B'.\n", "val p = ggplot(data) { x=\"x\"; y=\"y\"; color=\"cond\" } + \n", " scale_color_manual(values=listOf(\"red\", \"#1C9E77\"), breaks=listOf('A', 'B'))\n", "p + geom_point(size=5, alpha=.2)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 4, "metadata": { "new_classpath": [] }, "output_type": "execute_result" } ], "source": [ "// The 'random' sampling loses the group 'A' altogether.\n", "p + geom_point(size=5, sampling=sampling_random(50, seed=2))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 5, "metadata": { "new_classpath": [] }, "output_type": "execute_result" } ], "source": [ "// Stratified sampling ensures that group 'A' is represented.\n", "p + geom_point(size=5, sampling=sampling_random_stratified(50, seed=2))" ] } ], "metadata": { "kernelspec": { "display_name": "Kotlin", "language": "kotlin", "name": "kotlin" }, "language_info": { "codemirror_mode": "text/x-kotlin", "file_extension": ".kt", "mimetype": "text/x-kotlin", "name": "kotlin", "pygments_lexer": "kotlin", "version": "1.4.30-dev-2223" } }, "nbformat": 4, "nbformat_minor": 4 }