{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Combining Hugging Face datasets with dask\n", "\n", "> Using 🤗 datasets in combination with dask \n", "\n", "- toc: true \n", "- badges: false\n", "- comments: true\n", "- categories: [huggingface, huggingface-datasets, dask]\n", "- search_exclude: false\n", "- badges: true\n", "- image: images/dask_plot_example.png" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hugging Face datasets is a super useful library for loading, processing and sharing datasets with other people. \n", "\n", "For many pre-processing steps it works beautifully. The one area where it can be a bit trickier to use is for EDA style analysis. This column-wise EDA is often important as an early step in working with some data or for preparing a data card. \n", "\n", "Fortunately combining datasets and another data library, [dask](https://www.dask.org/) works pretty smoothly. This isn't intended to be a full intro to either datasets or dask but hopefully gives you a sense of how both libaries work and how they can complement each other. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, make sure we have the required libraries. [Rich](https://rich.readthedocs.io/en/stable/) is there for a little added visual flair ✨ " ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "id": "lIYdn1woOS1n" }, "outputs": [], "source": [ "%%capture\n", "!pip install datasets toolz rich[jupyter] dask" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "gKumkrtPnvdg" }, "outputs": [], "source": [ "%load_ext rich" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load some data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this example we will use a the [blbooksgenre dataset](https://huggingface.co/datasets/blbooksgenre) that contains metadata about some digitised books from the British Library. This collection also includes some annotations for the genre of the book which we could use to train a machine learning model. \n", "\n", "We can load a dataset hosted on the Hugging Face hub by using the `load_dataset` function." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "21uMQyUFhryg" }, "outputs": [], "source": [ "from datasets import load_dataset" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 87, "referenced_widgets": [ "408dd31921ee40bb9c9fc8995d4b8577", "5e1150b6dfb8494195191efaeb5b7feb", "6edda0b1149541499ab485df06711944", "45fcaeca1a184768a624607172ab7d72", "ea28bfd1711148c29b494a0194d2ffbf", "b3eaa93967914d7f8977d736611b845f", "5d3ee4cdf67e44878bac361e143b828a", "bff82a2eab2a4f7d87f8037e30839050", "0c0752e8e9024977979562531ad5a4b7", "4ccb508cf3ab4001a30cea47d2448e23", "88809da60e3846dc886cd2545edbc5c3" ] }, "id": "P8C8Ljd1i1zj", "outputId": "1b619316-4f2d-4f07-862b-08747f6a715e" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Reusing dataset bl_books_genre (/Users/dvanstrien/.cache/huggingface/datasets/bl_books_genre/annotated_raw/1.1.0/1e01f82403b3d9344121c3b81e5ad7c130338b250bf95dad4c6ab342c642dbe8)\n" ] } ], "source": [ "ds = load_dataset(\"blbooksgenre\", \"annotated_raw\", split=\"train\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we requested only the train split we get back a `Dataset`" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 131 }, "id": "Wx9ZlPhhjAiC", "outputId": "d7e74115-796c-4acd-9f15-aae838cffdee" }, "outputs": [ { "data": { "text/html": [ "
\n",
       "Dataset({\n",
       "    features: ['BL record ID', 'Name', 'Dates associated with name', 'Type of name', 'Role', 'All names', 'Title', 'Variant titles', 'Series title', 'Number within series', 'Country of publication', 'Place of publication', 'Publisher', 'Date of publication', 'Edition', 'Physical description', 'Dewey classification', 'BL shelfmark', 'Topics', 'Genre', 'Languages', 'Notes', 'BL record ID for physical resource', 'classification_id', 'user_id', 'subject_ids', 'annotator_date_pub', 'annotator_normalised_date_pub', 'annotator_edition_statement', 'annotator_FAST_genre_terms', 'annotator_FAST_subject_terms', 'annotator_comments', 'annotator_main_language', 'annotator_other_languages_summaries', 'annotator_summaries_language', 'annotator_translation', 'annotator_original_language', 'annotator_publisher', 'annotator_place_pub', 'annotator_country', 'annotator_title', 'Link to digitised book', 'annotated', 'Type of resource', 'created_at', 'annotator_genre'],\n",
       "    num_rows: 4398\n",
       "})\n",
       "
\n" ], "text/plain": [ "\n", "\u001b[1;35mDataset\u001b[0m\u001b[1m(\u001b[0m\u001b[1m{\u001b[0m\n", " features: \u001b[1m[\u001b[0m\u001b[32m'BL record ID'\u001b[0m, \u001b[32m'Name'\u001b[0m, \u001b[32m'Dates associated with name'\u001b[0m, \u001b[32m'Type of name'\u001b[0m, \u001b[32m'Role'\u001b[0m, \u001b[32m'All names'\u001b[0m, \u001b[32m'Title'\u001b[0m, \u001b[32m'Variant titles'\u001b[0m, \u001b[32m'Series title'\u001b[0m, \u001b[32m'Number within series'\u001b[0m, \u001b[32m'Country of publication'\u001b[0m, \u001b[32m'Place of publication'\u001b[0m, \u001b[32m'Publisher'\u001b[0m, \u001b[32m'Date of publication'\u001b[0m, \u001b[32m'Edition'\u001b[0m, \u001b[32m'Physical description'\u001b[0m, \u001b[32m'Dewey classification'\u001b[0m, \u001b[32m'BL shelfmark'\u001b[0m, \u001b[32m'Topics'\u001b[0m, \u001b[32m'Genre'\u001b[0m, \u001b[32m'Languages'\u001b[0m, \u001b[32m'Notes'\u001b[0m, \u001b[32m'BL record ID for physical resource'\u001b[0m, \u001b[32m'classification_id'\u001b[0m, \u001b[32m'user_id'\u001b[0m, \u001b[32m'subject_ids'\u001b[0m, \u001b[32m'annotator_date_pub'\u001b[0m, \u001b[32m'annotator_normalised_date_pub'\u001b[0m, \u001b[32m'annotator_edition_statement'\u001b[0m, \u001b[32m'annotator_FAST_genre_terms'\u001b[0m, \u001b[32m'annotator_FAST_subject_terms'\u001b[0m, \u001b[32m'annotator_comments'\u001b[0m, \u001b[32m'annotator_main_language'\u001b[0m, \u001b[32m'annotator_other_languages_summaries'\u001b[0m, \u001b[32m'annotator_summaries_language'\u001b[0m, \u001b[32m'annotator_translation'\u001b[0m, \u001b[32m'annotator_original_language'\u001b[0m, \u001b[32m'annotator_publisher'\u001b[0m, \u001b[32m'annotator_place_pub'\u001b[0m, \u001b[32m'annotator_country'\u001b[0m, \u001b[32m'annotator_title'\u001b[0m, \u001b[32m'Link to digitised book'\u001b[0m, \u001b[32m'annotated'\u001b[0m, \u001b[32m'Type of resource'\u001b[0m, \u001b[32m'created_at'\u001b[0m, \u001b[32m'annotator_genre'\u001b[0m\u001b[1m]\u001b[0m,\n", " num_rows: \u001b[1;36m4398\u001b[0m\n", "\u001b[1m}\u001b[0m\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see this has a bunch of columns. One that is of interest is the `Data of publication` column. Since we could use this dataset to train some type of classifier we may want to check whether we have enough examples across different time periods in the dataset. " ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "id": "Iz1qpri2jA7F", "outputId": "6927a531-4649-4ec8-b40e-0d467f735a8d" }, "outputs": [ { "data": { "text/html": [ "
'1879'\n",
       "
\n" ], "text/plain": [ "\u001b[32m'1879'\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ds[0][\"Date of publication\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using toolz to calculate frequencies for a column" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One quick way we can get the frequency count for a column is using the wonderful [toolz](https://toolz.readthedocs.io/en/latest/index.html) library \n", "\n", "If our data fits in memory, we can simply pass in a column containing a categorical value to a frequency function to get a frequency count. " ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "id": "wipOCP-wjB3Y" }, "outputs": [], "source": [ "from toolz import frequencies, topk" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "id": "iy8JiqQNjFUw" }, "outputs": [], "source": [ "dates = ds[\"Date of publication\"]" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "s1WXFnFljx5B", "outputId": "9d4e5792-30e1-4279-9534-4574a39d321b" }, "outputs": [ { "data": { "text/html": [ "
\n",
       "{\n",
       "    '1879': 99,\n",
       "    '1774': 5,\n",
       "    '1765': 5,\n",
       "    '1877': 69,\n",
       "    '1893': 222,\n",
       "    '1891': 148,\n",
       "    '1827': 29,\n",
       "    '1868': 42,\n",
       "    '1878': 72,\n",
       "    '1895': 189,\n",
       "    '1897': 120,\n",
       "    '1899': 104,\n",
       "    '1896': 174,\n",
       "    '1876': 48,\n",
       "    '1812': 13,\n",
       "    '1799': 8,\n",
       "    '1830': 32,\n",
       "    '1870': 42,\n",
       "    '1894': 155,\n",
       "    '1864': 28,\n",
       "    '1855': 42,\n",
       "    '1871': 42,\n",
       "    '1836': 37,\n",
       "    '1883': 51,\n",
       "    '1880': 111,\n",
       "    '1884': 69,\n",
       "    '1822': 16,\n",
       "    '1856': 38,\n",
       "    '1872': 42,\n",
       "    '1875': 57,\n",
       "    '1844': 35,\n",
       "    '1890': 134,\n",
       "    '1886': 43,\n",
       "    '1840': 15,\n",
       "    '1888': 109,\n",
       "    '1858': 43,\n",
       "    '1867': 53,\n",
       "    '1826': 24,\n",
       "    '1800': 3,\n",
       "    '1851': 43,\n",
       "    '1838': 14,\n",
       "    '1824': 20,\n",
       "    '1887': 58,\n",
       "    '1874': 42,\n",
       "    '1857': 44,\n",
       "    '1873': 34,\n",
       "    '1837': 16,\n",
       "    '1846': 32,\n",
       "    '1881': 55,\n",
       "    '1898': 104,\n",
       "    '1906': 4,\n",
       "    '1892': 134,\n",
       "    '1869': 25,\n",
       "    '1885': 69,\n",
       "    '1882': 71,\n",
       "    '1863': 55,\n",
       "    '1865': 53,\n",
       "    '1635': 3,\n",
       "    '1859': 39,\n",
       "    '1818': 17,\n",
       "    '1845': 28,\n",
       "    '1852': 43,\n",
       "    '1841': 23,\n",
       "    '1842': 29,\n",
       "    '1848': 28,\n",
       "    '1828': 23,\n",
       "    '1850': 38,\n",
       "    '1860': 45,\n",
       "    '1889': 140,\n",
       "    '1815': 5,\n",
       "    '1861': 28,\n",
       "    '1814': 13,\n",
       "    '1843': 28,\n",
       "    '1817': 12,\n",
       "    '1819': 16,\n",
       "    '1853': 34,\n",
       "    '1833': 5,\n",
       "    '1854': 36,\n",
       "    '1839': 33,\n",
       "    '1803': 7,\n",
       "    '1835': 14,\n",
       "    '1813': 8,\n",
       "    '1695': 4,\n",
       "    '1809-1811': 5,\n",
       "    '1832': 9,\n",
       "    '1823': 17,\n",
       "    '1847': 28,\n",
       "    '1816': 8,\n",
       "    '1806': 5,\n",
       "    '1866': 26,\n",
       "    '1829': 13,\n",
       "    '1791': 5,\n",
       "    '1637': 5,\n",
       "    '1821': 4,\n",
       "    '1807': 14,\n",
       "    '1862': 22,\n",
       "    '1795': 5,\n",
       "    '1834': 12,\n",
       "    '1831': 10,\n",
       "    '1849': 13,\n",
       "    '1811': 1,\n",
       "    '1825': 1,\n",
       "    '1809': 3,\n",
       "    '1905': 1,\n",
       "    '1808': 1,\n",
       "    '1900': 5,\n",
       "    '1892-1912': 1,\n",
       "    '1804': 4,\n",
       "    '1769': 5,\n",
       "    '1910': 1,\n",
       "    '1805': 5,\n",
       "    '1802': 3,\n",
       "    '1871-': 1,\n",
       "    '1901': 5,\n",
       "    '1884-1909': 1,\n",
       "    '1873-1887': 1,\n",
       "    '1979': 1,\n",
       "    '1852-1941': 1,\n",
       "    '1903': 1,\n",
       "    '1871-1873': 1,\n",
       "    '1810': 3,\n",
       "    '1907': 1,\n",
       "    '1820': 5,\n",
       "    '1789': 5\n",
       "}\n",
       "
\n" ], "text/plain": [ "\n", "\u001b[1m{\u001b[0m\n", " \u001b[32m'1879'\u001b[0m: \u001b[1;36m99\u001b[0m,\n", " \u001b[32m'1774'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1765'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1877'\u001b[0m: \u001b[1;36m69\u001b[0m,\n", " \u001b[32m'1893'\u001b[0m: \u001b[1;36m222\u001b[0m,\n", " \u001b[32m'1891'\u001b[0m: \u001b[1;36m148\u001b[0m,\n", " \u001b[32m'1827'\u001b[0m: \u001b[1;36m29\u001b[0m,\n", " \u001b[32m'1868'\u001b[0m: \u001b[1;36m42\u001b[0m,\n", " \u001b[32m'1878'\u001b[0m: \u001b[1;36m72\u001b[0m,\n", " \u001b[32m'1895'\u001b[0m: \u001b[1;36m189\u001b[0m,\n", " \u001b[32m'1897'\u001b[0m: \u001b[1;36m120\u001b[0m,\n", " \u001b[32m'1899'\u001b[0m: \u001b[1;36m104\u001b[0m,\n", " \u001b[32m'1896'\u001b[0m: \u001b[1;36m174\u001b[0m,\n", " \u001b[32m'1876'\u001b[0m: \u001b[1;36m48\u001b[0m,\n", " \u001b[32m'1812'\u001b[0m: \u001b[1;36m13\u001b[0m,\n", " \u001b[32m'1799'\u001b[0m: \u001b[1;36m8\u001b[0m,\n", " \u001b[32m'1830'\u001b[0m: \u001b[1;36m32\u001b[0m,\n", " \u001b[32m'1870'\u001b[0m: \u001b[1;36m42\u001b[0m,\n", " \u001b[32m'1894'\u001b[0m: \u001b[1;36m155\u001b[0m,\n", " \u001b[32m'1864'\u001b[0m: \u001b[1;36m28\u001b[0m,\n", " \u001b[32m'1855'\u001b[0m: \u001b[1;36m42\u001b[0m,\n", " \u001b[32m'1871'\u001b[0m: \u001b[1;36m42\u001b[0m,\n", " \u001b[32m'1836'\u001b[0m: \u001b[1;36m37\u001b[0m,\n", " \u001b[32m'1883'\u001b[0m: \u001b[1;36m51\u001b[0m,\n", " \u001b[32m'1880'\u001b[0m: \u001b[1;36m111\u001b[0m,\n", " \u001b[32m'1884'\u001b[0m: \u001b[1;36m69\u001b[0m,\n", " \u001b[32m'1822'\u001b[0m: \u001b[1;36m16\u001b[0m,\n", " \u001b[32m'1856'\u001b[0m: \u001b[1;36m38\u001b[0m,\n", " \u001b[32m'1872'\u001b[0m: \u001b[1;36m42\u001b[0m,\n", " \u001b[32m'1875'\u001b[0m: \u001b[1;36m57\u001b[0m,\n", " \u001b[32m'1844'\u001b[0m: \u001b[1;36m35\u001b[0m,\n", " \u001b[32m'1890'\u001b[0m: \u001b[1;36m134\u001b[0m,\n", " \u001b[32m'1886'\u001b[0m: \u001b[1;36m43\u001b[0m,\n", " \u001b[32m'1840'\u001b[0m: \u001b[1;36m15\u001b[0m,\n", " \u001b[32m'1888'\u001b[0m: \u001b[1;36m109\u001b[0m,\n", " \u001b[32m'1858'\u001b[0m: \u001b[1;36m43\u001b[0m,\n", " \u001b[32m'1867'\u001b[0m: \u001b[1;36m53\u001b[0m,\n", " \u001b[32m'1826'\u001b[0m: \u001b[1;36m24\u001b[0m,\n", " \u001b[32m'1800'\u001b[0m: \u001b[1;36m3\u001b[0m,\n", " \u001b[32m'1851'\u001b[0m: \u001b[1;36m43\u001b[0m,\n", " \u001b[32m'1838'\u001b[0m: \u001b[1;36m14\u001b[0m,\n", " \u001b[32m'1824'\u001b[0m: \u001b[1;36m20\u001b[0m,\n", " \u001b[32m'1887'\u001b[0m: \u001b[1;36m58\u001b[0m,\n", " \u001b[32m'1874'\u001b[0m: \u001b[1;36m42\u001b[0m,\n", " \u001b[32m'1857'\u001b[0m: \u001b[1;36m44\u001b[0m,\n", " \u001b[32m'1873'\u001b[0m: \u001b[1;36m34\u001b[0m,\n", " \u001b[32m'1837'\u001b[0m: \u001b[1;36m16\u001b[0m,\n", " \u001b[32m'1846'\u001b[0m: \u001b[1;36m32\u001b[0m,\n", " \u001b[32m'1881'\u001b[0m: \u001b[1;36m55\u001b[0m,\n", " \u001b[32m'1898'\u001b[0m: \u001b[1;36m104\u001b[0m,\n", " \u001b[32m'1906'\u001b[0m: \u001b[1;36m4\u001b[0m,\n", " \u001b[32m'1892'\u001b[0m: \u001b[1;36m134\u001b[0m,\n", " \u001b[32m'1869'\u001b[0m: \u001b[1;36m25\u001b[0m,\n", " \u001b[32m'1885'\u001b[0m: \u001b[1;36m69\u001b[0m,\n", " \u001b[32m'1882'\u001b[0m: \u001b[1;36m71\u001b[0m,\n", " \u001b[32m'1863'\u001b[0m: \u001b[1;36m55\u001b[0m,\n", " \u001b[32m'1865'\u001b[0m: \u001b[1;36m53\u001b[0m,\n", " \u001b[32m'1635'\u001b[0m: \u001b[1;36m3\u001b[0m,\n", " \u001b[32m'1859'\u001b[0m: \u001b[1;36m39\u001b[0m,\n", " \u001b[32m'1818'\u001b[0m: \u001b[1;36m17\u001b[0m,\n", " \u001b[32m'1845'\u001b[0m: \u001b[1;36m28\u001b[0m,\n", " \u001b[32m'1852'\u001b[0m: \u001b[1;36m43\u001b[0m,\n", " \u001b[32m'1841'\u001b[0m: \u001b[1;36m23\u001b[0m,\n", " \u001b[32m'1842'\u001b[0m: \u001b[1;36m29\u001b[0m,\n", " \u001b[32m'1848'\u001b[0m: \u001b[1;36m28\u001b[0m,\n", " \u001b[32m'1828'\u001b[0m: \u001b[1;36m23\u001b[0m,\n", " \u001b[32m'1850'\u001b[0m: \u001b[1;36m38\u001b[0m,\n", " \u001b[32m'1860'\u001b[0m: \u001b[1;36m45\u001b[0m,\n", " \u001b[32m'1889'\u001b[0m: \u001b[1;36m140\u001b[0m,\n", " \u001b[32m'1815'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1861'\u001b[0m: \u001b[1;36m28\u001b[0m,\n", " \u001b[32m'1814'\u001b[0m: \u001b[1;36m13\u001b[0m,\n", " \u001b[32m'1843'\u001b[0m: \u001b[1;36m28\u001b[0m,\n", " \u001b[32m'1817'\u001b[0m: \u001b[1;36m12\u001b[0m,\n", " \u001b[32m'1819'\u001b[0m: \u001b[1;36m16\u001b[0m,\n", " \u001b[32m'1853'\u001b[0m: \u001b[1;36m34\u001b[0m,\n", " \u001b[32m'1833'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1854'\u001b[0m: \u001b[1;36m36\u001b[0m,\n", " \u001b[32m'1839'\u001b[0m: \u001b[1;36m33\u001b[0m,\n", " \u001b[32m'1803'\u001b[0m: \u001b[1;36m7\u001b[0m,\n", " \u001b[32m'1835'\u001b[0m: \u001b[1;36m14\u001b[0m,\n", " \u001b[32m'1813'\u001b[0m: \u001b[1;36m8\u001b[0m,\n", " \u001b[32m'1695'\u001b[0m: \u001b[1;36m4\u001b[0m,\n", " \u001b[32m'1809-1811'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1832'\u001b[0m: \u001b[1;36m9\u001b[0m,\n", " \u001b[32m'1823'\u001b[0m: \u001b[1;36m17\u001b[0m,\n", " \u001b[32m'1847'\u001b[0m: \u001b[1;36m28\u001b[0m,\n", " \u001b[32m'1816'\u001b[0m: \u001b[1;36m8\u001b[0m,\n", " \u001b[32m'1806'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1866'\u001b[0m: \u001b[1;36m26\u001b[0m,\n", " \u001b[32m'1829'\u001b[0m: \u001b[1;36m13\u001b[0m,\n", " \u001b[32m'1791'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1637'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1821'\u001b[0m: \u001b[1;36m4\u001b[0m,\n", " \u001b[32m'1807'\u001b[0m: \u001b[1;36m14\u001b[0m,\n", " \u001b[32m'1862'\u001b[0m: \u001b[1;36m22\u001b[0m,\n", " \u001b[32m'1795'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1834'\u001b[0m: \u001b[1;36m12\u001b[0m,\n", " \u001b[32m'1831'\u001b[0m: \u001b[1;36m10\u001b[0m,\n", " \u001b[32m'1849'\u001b[0m: \u001b[1;36m13\u001b[0m,\n", " \u001b[32m'1811'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1825'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1809'\u001b[0m: \u001b[1;36m3\u001b[0m,\n", " \u001b[32m'1905'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1808'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1900'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1892-1912'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1804'\u001b[0m: \u001b[1;36m4\u001b[0m,\n", " \u001b[32m'1769'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1910'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1805'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1802'\u001b[0m: \u001b[1;36m3\u001b[0m,\n", " \u001b[32m'1871-'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1901'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1884-1909'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1873-1887'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1979'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1852-1941'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1903'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1871-1873'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1810'\u001b[0m: \u001b[1;36m3\u001b[0m,\n", " \u001b[32m'1907'\u001b[0m: \u001b[1;36m1\u001b[0m,\n", " \u001b[32m'1820'\u001b[0m: \u001b[1;36m5\u001b[0m,\n", " \u001b[32m'1789'\u001b[0m: \u001b[1;36m5\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# collapse_hide\n", "\n", "frequencies(dates)" ] }, { "cell_type": "markdown", "metadata": { "id": "ko_frygKn1go" }, "source": [ "## Make it parallel!\n", "\n", "If our data doesn't fit in memory or we want to do things in parallel we might want to use a slightly different approach. This is where dask can play a role. \n", "\n", "Dask offers a number of different collection abstractions that make it easier to do things in parallel. This includes dask bag.\n", "\n", "First we'll create a dask client here, I won't dig into the details of this here but you can get a good overview in the [getting started](https://www.dask.org/get-started) pages. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from distributed import Client" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "client = Client()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we don't want to load all of our data into memory we can great a generator that will yield one row at a time. In this case we'll start by exploring the `Title` column " ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "id": "modIC-hIj0Vs" }, "outputs": [], "source": [ "def yield_titles():\n", " for row in ds:\n", " yield row[\"Title\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that this returns a generator " ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "qDFTGtUuj6xk", "outputId": "0fe3d283-19ec-4c47-e851-dca5292759c5" }, "outputs": [ { "data": { "text/html": [ "
<generator object yield_titles at 0x7ffc28fdc040>\n",
       "
\n" ], "text/plain": [ "\u001b[1m<\u001b[0m\u001b[1;95mgenerator\u001b[0m\u001b[39m object yield_titles at \u001b[0m\u001b[1;36m0x7ffc28fdc040\u001b[0m\u001b[1m>\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "yield_titles()" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 36 }, "id": "y3R8YVXykUQX", "outputId": "91bdfcae-1b6e-4c6b-a586-7375b00ee9bc" }, "outputs": [ { "data": { "text/html": [ "
'The Canadian farmer. A missionary incident [Signed: W. J. H. Y, i.e. William J. H. Yates.]'\n",
       "
\n" ], "text/plain": [ "\u001b[32m'The Canadian farmer. A missionary incident \u001b[0m\u001b[32m[\u001b[0m\u001b[32mSigned: W. J. H. Y, i.e. William J. H. Yates.\u001b[0m\u001b[32m]\u001b[0m\u001b[32m'\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "next(iter(yield_titles()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can store this in a titles variable. " ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "HVbIYOEwkVt-" }, "outputs": [], "source": [ "titles = yield_titles()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll now import dask bag. " ] }, { "cell_type": "markdown", "metadata": { "id": "d44WSkiDkhEo" }, "source": [ "import dask.bag as db" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can create a dask bag object using the `from_sequence` method. " ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "id": "UPKp665fkjmn" }, "outputs": [], "source": [ "bag = db.from_sequence(titles)" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
dask.bag<from_sequence, npartitions=1>\n",
       "
\n" ], "text/plain": [ "dask.bag\u001b[1m<\u001b[0m\u001b[1;95mfrom_sequence\u001b[0m\u001b[39m, \u001b[0m\u001b[33mnpartitions\u001b[0m\u001b[39m=\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1m>\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "bag" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can look at an example using the `take` method" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "uSL2Den-kl8O", "outputId": "dacee933-d4c0-4fc1-b7ce-2d4b8ce800b7" }, "outputs": [ { "data": { "text/html": [ "
\n",
       "(\n",
       "    [\n",
       "        'The',\n",
       "        'Canadian',\n",
       "        'farmer.',\n",
       "        'A',\n",
       "        'missionary',\n",
       "        'incident',\n",
       "        '[Signed:',\n",
       "        'W.',\n",
       "        'J.',\n",
       "        'H.',\n",
       "        'Y,',\n",
       "        'i.e.',\n",
       "        'William',\n",
       "        'J.',\n",
       "        'H.',\n",
       "        'Yates.]'\n",
       "    ],\n",
       ")\n",
       "
\n" ], "text/plain": [ "\n", "\u001b[1m(\u001b[0m\n", " \u001b[1m[\u001b[0m\n", " \u001b[32m'The'\u001b[0m,\n", " \u001b[32m'Canadian'\u001b[0m,\n", " \u001b[32m'farmer.'\u001b[0m,\n", " \u001b[32m'A'\u001b[0m,\n", " \u001b[32m'missionary'\u001b[0m,\n", " \u001b[32m'incident'\u001b[0m,\n", " \u001b[32m'\u001b[0m\u001b[32m[\u001b[0m\u001b[32mSigned:'\u001b[0m,\n", " \u001b[32m'W.'\u001b[0m,\n", " \u001b[32m'J.'\u001b[0m,\n", " \u001b[32m'H.'\u001b[0m,\n", " \u001b[32m'Y,'\u001b[0m,\n", " \u001b[32m'i.e.'\u001b[0m,\n", " \u001b[32m'William'\u001b[0m,\n", " \u001b[32m'J.'\u001b[0m,\n", " \u001b[32m'H.'\u001b[0m,\n", " \u001b[32m'Yates.\u001b[0m\u001b[32m]\u001b[0m\u001b[32m'\u001b[0m\n", " \u001b[1m]\u001b[0m,\n", "\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "bag.take(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "dask bag has a bunch of handy methods for processing data (some of these we could also do in 🤗 datasets but others are not available as specific methods in datasets). \n", "\n", "For example we can make sure we only have unique titles using the `distinct` method. " ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "id": "afOe5tr7kowe" }, "outputs": [], "source": [ "unique_titles = bag.distinct()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "hWVI4HZykxuy", "outputId": "4392f098-1f1d-4950-daeb-679de217d111" }, "outputs": [ { "data": { "text/html": [ "
\n",
       "(\n",
       "    'The Canadian farmer. A missionary incident [Signed: W. J. H. Y, i.e. William J. H. Yates.]',\n",
       "    'A new musical Interlude, called the Election [By M. P. Andrews.]',\n",
       "    'An Elegy written among the ruins of an Abbey. By the author of the Nun [E. Jerningham]',\n",
       "    \"The Baron's Daughter. A ballad by the author of Poetical Recreations [i.e. William C. Hazlitt] . F.P\"\n",
       ")\n",
       "
\n" ], "text/plain": [ "\n", "\u001b[1m(\u001b[0m\n", " \u001b[32m'The Canadian farmer. A missionary incident \u001b[0m\u001b[32m[\u001b[0m\u001b[32mSigned: W. J. H. Y, i.e. William J. H. Yates.\u001b[0m\u001b[32m]\u001b[0m\u001b[32m'\u001b[0m,\n", " \u001b[32m'A new musical Interlude, called the Election \u001b[0m\u001b[32m[\u001b[0m\u001b[32mBy M. P. Andrews.\u001b[0m\u001b[32m]\u001b[0m\u001b[32m'\u001b[0m,\n", " \u001b[32m'An Elegy written among the ruins of an Abbey. By the author of the Nun \u001b[0m\u001b[32m[\u001b[0m\u001b[32mE. Jerningham\u001b[0m\u001b[32m]\u001b[0m\u001b[32m'\u001b[0m,\n", " \u001b[32m\"The Baron's Daughter. A ballad by the author of Poetical Recreations \u001b[0m\u001b[32m[\u001b[0m\u001b[32mi.e. William C. Hazlitt\u001b[0m\u001b[32m]\u001b[0m\u001b[32m . F.P\"\u001b[0m\n", "\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "unique_titles.take(4)" ] }, { "cell_type": "markdown", "metadata": { "id": "YCZP1Zh2kzGB" }, "source": [ "Similar to 🤗 datasets we have a map method that we can use to apply a function to all of our examples. In this case we split the title text into individual words. \n" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "id": "1vrcwsMXlAi6" }, "outputs": [], "source": [ "title_words_split = unique_titles.map(lambda x: x.split(\" \"))" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "CIrlsZonlEmu", "outputId": "51037d54-2f8c-4f67-903c-9408c7fc5b4c" }, "outputs": [ { "data": { "text/html": [ "
\n",
       "(\n",
       "    [\n",
       "        'The',\n",
       "        'Canadian',\n",
       "        'farmer.',\n",
       "        'A',\n",
       "        'missionary',\n",
       "        'incident',\n",
       "        '[Signed:',\n",
       "        'W.',\n",
       "        'J.',\n",
       "        'H.',\n",
       "        'Y,',\n",
       "        'i.e.',\n",
       "        'William',\n",
       "        'J.',\n",
       "        'H.',\n",
       "        'Yates.]'\n",
       "    ],\n",
       "    [\n",
       "        'A',\n",
       "        'new',\n",
       "        'musical',\n",
       "        'Interlude,',\n",
       "        'called',\n",
       "        'the',\n",
       "        'Election',\n",
       "        '[By',\n",
       "        'M.',\n",
       "        'P.',\n",
       "        'Andrews.]'\n",
       "    ]\n",
       ")\n",
       "
\n" ], "text/plain": [ "\n", "\u001b[1m(\u001b[0m\n", " \u001b[1m[\u001b[0m\n", " \u001b[32m'The'\u001b[0m,\n", " \u001b[32m'Canadian'\u001b[0m,\n", " \u001b[32m'farmer.'\u001b[0m,\n", " \u001b[32m'A'\u001b[0m,\n", " \u001b[32m'missionary'\u001b[0m,\n", " \u001b[32m'incident'\u001b[0m,\n", " \u001b[32m'\u001b[0m\u001b[32m[\u001b[0m\u001b[32mSigned:'\u001b[0m,\n", " \u001b[32m'W.'\u001b[0m,\n", " \u001b[32m'J.'\u001b[0m,\n", " \u001b[32m'H.'\u001b[0m,\n", " \u001b[32m'Y,'\u001b[0m,\n", " \u001b[32m'i.e.'\u001b[0m,\n", " \u001b[32m'William'\u001b[0m,\n", " \u001b[32m'J.'\u001b[0m,\n", " \u001b[32m'H.'\u001b[0m,\n", " \u001b[32m'Yates.\u001b[0m\u001b[32m]\u001b[0m\u001b[32m'\u001b[0m\n", " \u001b[1m]\u001b[0m,\n", " \u001b[1m[\u001b[0m\n", " \u001b[32m'A'\u001b[0m,\n", " \u001b[32m'new'\u001b[0m,\n", " \u001b[32m'musical'\u001b[0m,\n", " \u001b[32m'Interlude,'\u001b[0m,\n", " \u001b[32m'called'\u001b[0m,\n", " \u001b[32m'the'\u001b[0m,\n", " \u001b[32m'Election'\u001b[0m,\n", " \u001b[32m'\u001b[0m\u001b[32m[\u001b[0m\u001b[32mBy'\u001b[0m,\n", " \u001b[32m'M.'\u001b[0m,\n", " \u001b[32m'P.'\u001b[0m,\n", " \u001b[32m'Andrews.\u001b[0m\u001b[32m]\u001b[0m\u001b[32m'\u001b[0m\n", " \u001b[1m]\u001b[0m\n", "\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "title_words_split.take(2)" ] }, { "cell_type": "markdown", "metadata": { "id": "D0LQxfYEk9id" }, "source": [ "We can see we now have all our words in a list. Helpfully dask bag has a `flatten` method. This will consume our lists and put all the words in a single sequence. " ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "id": "JDBCPDP2lFd2" }, "outputs": [], "source": [ "flattend_title_words = title_words_split.flatten()" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "UMiAV4sjlKz4", "outputId": "9c473358-4520-42f0-cb0e-3e6696e86283" }, "outputs": [ { "data": { "text/html": [ "
('The', 'Canadian')\n",
       "
\n" ], "text/plain": [ "\u001b[1m(\u001b[0m\u001b[32m'The'\u001b[0m, \u001b[32m'Canadian'\u001b[0m\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "flattend_title_words.take(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could now use the `frequencies` method to get the top words. " ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4NIeaL0klZhI", "outputId": "7d7a32e3-6a82-4437-996d-ba325dc18758" }, "outputs": [], "source": [ "freqs = flattend_title_words.frequencies(sort=True)" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
dask.bag<sorted, npartitions=1>\n",
       "
\n" ], "text/plain": [ "dask.bag\u001b[1m<\u001b[0m\u001b[1;95msorted\u001b[0m\u001b[39m, \u001b[0m\u001b[33mnpartitions\u001b[0m\u001b[39m=\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1m>\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "freqs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since dask bag methods are lazy by default nothing has actually been calculated yet. We could just grab the top 10 words. " ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [], "source": [ "top_10_words = freqs.topk(10, key=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want the results of something we call `compute` which will call all of the chained methods on our bag. \n" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n",
       "[\n",
       "    ('of', 808),\n",
       "    ('the', 674),\n",
       "    ('and', 550),\n",
       "    ('...', 518),\n",
       "    ('in', 402),\n",
       "    ('van', 306),\n",
       "    ('etc', 301),\n",
       "    ('de', 258),\n",
       "    ('en', 258),\n",
       "    ('a', 231)\n",
       "]\n",
       "
\n" ], "text/plain": [ "\n", "\u001b[1m[\u001b[0m\n", " \u001b[1m(\u001b[0m\u001b[32m'of'\u001b[0m, \u001b[1;36m808\u001b[0m\u001b[1m)\u001b[0m,\n", " \u001b[1m(\u001b[0m\u001b[32m'the'\u001b[0m, \u001b[1;36m674\u001b[0m\u001b[1m)\u001b[0m,\n", " \u001b[1m(\u001b[0m\u001b[32m'and'\u001b[0m, \u001b[1;36m550\u001b[0m\u001b[1m)\u001b[0m,\n", " \u001b[1m(\u001b[0m\u001b[32m'...'\u001b[0m, \u001b[1;36m518\u001b[0m\u001b[1m)\u001b[0m,\n", " \u001b[1m(\u001b[0m\u001b[32m'in'\u001b[0m, \u001b[1;36m402\u001b[0m\u001b[1m)\u001b[0m,\n", " \u001b[1m(\u001b[0m\u001b[32m'van'\u001b[0m, \u001b[1;36m306\u001b[0m\u001b[1m)\u001b[0m,\n", " \u001b[1m(\u001b[0m\u001b[32m'etc'\u001b[0m, \u001b[1;36m301\u001b[0m\u001b[1m)\u001b[0m,\n", " \u001b[1m(\u001b[0m\u001b[32m'de'\u001b[0m, \u001b[1;36m258\u001b[0m\u001b[1m)\u001b[0m,\n", " \u001b[1m(\u001b[0m\u001b[32m'en'\u001b[0m, \u001b[1;36m258\u001b[0m\u001b[1m)\u001b[0m,\n", " \u001b[1m(\u001b[0m\u001b[32m'a'\u001b[0m, \u001b[1;36m231\u001b[0m\u001b[1m)\u001b[0m\n", "\u001b[1m]\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "top_10_words.compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could also do the same with lowered version " ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [], "source": [ "lowered_title_words = flattend_title_words.map(lambda x: x.lower())" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [], "source": [ "freqs = lowered_title_words.frequencies(sort=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The visualize method gives you some insights into how the computation is managed by dask. " ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d985cef20e334ec49f640e264f51b494", "version_major": 2, "version_minor": 0 }, "text/plain": [ "CytoscapeWidget(cytoscape_layout={'name': 'dagre', 'rankDir': 'BT', 'nodeSep': 10, 'edgeSep': 10, 'spacingFact…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "freqs.visualize(engine=\"cytoscape\", optimize_graph=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Moving from datasets to a dask dataframe \n", "\n", "For some operations, dask bag is super easy to use. Sometimes though you will hurt your brain trying to crow bar your problem into the dask bag API 😵‍💫 This is where dask dataframes come in! Using parquet, we can easily save our 🤗 dataset as a parquet file. " ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3583138" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.to_parquet(\"genre.parquet\")" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [], "source": [ "import dask.dataframe as dd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and load from this file" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [], "source": [ "ddf = dd.read_parquet(\"genre.parquet\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As dask dataframe works quite similar to a pandas dataframe. It is lazy by default so if we just print it out" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Dask DataFrame Structure:
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BL record IDNameDates associated with nameType of nameRoleAll namesTitleVariant titlesSeries titleNumber within seriesCountry of publicationPlace of publicationPublisherDate of publicationEditionPhysical descriptionDewey classificationBL shelfmarkTopicsGenreLanguagesNotesBL record ID for physical resourceclassification_iduser_idsubject_idsannotator_date_pubannotator_normalised_date_pubannotator_edition_statementannotator_FAST_genre_termsannotator_FAST_subject_termsannotator_commentsannotator_main_languageannotator_other_languages_summariesannotator_summaries_languageannotator_translationannotator_original_languageannotator_publisherannotator_place_pubannotator_countryannotator_titleLink to digitised bookannotatedType of resourcecreated_atannotator_genre
npartitions=1
objectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectobjectboolint64datetime64[ns]int64
..........................................................................................................................................
\n", "
\n", "
Dask Name: read-parquet, 1 tasks
" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ddf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You'll see we don't actually get back any data. If we use head we get the number of examples we ask for. " ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BL record IDNameDates associated with nameType of nameRoleAll namesTitleVariant titlesSeries titleNumber within series...annotator_original_languageannotator_publisherannotator_place_pubannotator_countryannotator_titleLink to digitised bookannotatedType of resourcecreated_atannotator_genre
0014603046Yates, William Joseph H.person[Yates, William Joseph H. [person] , Y, W. J....The Canadian farmer. A missionary incident [Si......NONELondonenkThe Canadian farmer. A missionary incident [Si...http://access.bl.uk/item/viewer/ark:/81055/vdc...True02020-08-11 14:30:330
1014603046Yates, William Joseph H.person[Yates, William Joseph H. [person] , Y, W. J....The Canadian farmer. A missionary incident [Si......NONELondonenkThe Canadian farmer. A missionary incident [Si...http://access.bl.uk/item/viewer/ark:/81055/vdc...True02021-04-15 09:53:230
2014603046Yates, William Joseph H.person[Yates, William Joseph H. [person] , Y, W. J....The Canadian farmer. A missionary incident [Si......NONELondonenkThe Canadian farmer. A missionary incident [Si...http://access.bl.uk/item/viewer/ark:/81055/vdc...True02020-09-24 14:27:540
\n", "

3 rows × 46 columns

\n", "
" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ddf.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have some familiar methods from pandas available to us" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "ddf = ddf.drop_duplicates(subset=\"Title\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As an example of something that would be a bit tricky in datasets, we can see how to groupby the mean title length by year of publication. First we create a new column for title length" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [], "source": [ "ddf[\"title_len\"] = ddf[\"Title\"].map(lambda x: len(x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then groupby the date of publication " ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [], "source": [ "grouped = ddf.groupby(\"Date of publication\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and then calculate the mean `title_len` " ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [], "source": [ "mean_title_len = grouped[\"title_len\"].mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To actually compute this value we call the `compute` method " ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n",
       "Date of publication\n",
       "1635    248.0\n",
       "1637     67.0\n",
       "1695     63.0\n",
       "1765     86.0\n",
       "1769     20.0\n",
       "        ...  \n",
       "1905    141.0\n",
       "1906    225.0\n",
       "1907    142.0\n",
       "1910     65.0\n",
       "1979     43.0\n",
       "Name: title_len, Length: 124, dtype: float64\n",
       "
\n" ], "text/plain": [ "\n", "Date of publication\n", "\u001b[1;36m1635\u001b[0m \u001b[1;36m248.0\u001b[0m\n", "\u001b[1;36m1637\u001b[0m \u001b[1;36m67.0\u001b[0m\n", "\u001b[1;36m1695\u001b[0m \u001b[1;36m63.0\u001b[0m\n", "\u001b[1;36m1765\u001b[0m \u001b[1;36m86.0\u001b[0m\n", "\u001b[1;36m1769\u001b[0m \u001b[1;36m20.0\u001b[0m\n", " \u001b[33m...\u001b[0m \n", "\u001b[1;36m1905\u001b[0m \u001b[1;36m141.0\u001b[0m\n", "\u001b[1;36m1906\u001b[0m \u001b[1;36m225.0\u001b[0m\n", "\u001b[1;36m1907\u001b[0m \u001b[1;36m142.0\u001b[0m\n", "\u001b[1;36m1910\u001b[0m \u001b[1;36m65.0\u001b[0m\n", "\u001b[1;36m1979\u001b[0m \u001b[1;36m43.0\u001b[0m\n", "Name: title_len, Length: \u001b[1;36m124\u001b[0m, dtype: float64\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "mean_title_len.compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also create a plot in the usual way " ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<AxesSubplot:xlabel='Date of publication'>\n",
       "
\n" ], "text/plain": [ "\u001b[1m<\u001b[0m\u001b[1;95mAxesSubplot:\u001b[0m\u001b[1;33mxlabel\u001b[0m\u001b[39m=\u001b[0m\u001b[32m'Date of publication'\u001b[0m\u001b[1m>\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
<Figure size 432x288 with 1 Axes>\n",
       "
\n" ], "text/plain": [ "\u001b[1m<\u001b[0m\u001b[1;95mFigure\u001b[0m\u001b[39m size 432x288 with \u001b[0m\u001b[1;36m1\u001b[0m\u001b[39m Axes\u001b[0m\u001b[1m>\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEGCAYAAACevtWaAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAABKDElEQVR4nO29d5icV3n3/7mnb++76qsuW7Zsy5blJht3TGh2DD9McQw4IYBJaAk/eJNQkpgAyZsQWhISIE7AGBuDMQZj3HuRLMm2etdqVbf3MuW8fzxlZ3ZmdmfLzOys7s917TWzzzzzzDlTvs/9fM997iPGGBRFUZTZhSffDVAURVGmHxV3RVGUWYiKu6IoyixExV1RFGUWouKuKIoyC/HluwEAtbW1ZvHixfluhqIoSkHx6quvthpj6lI9NiPEffHixWzatCnfzVAURSkoRORwusfUllEURZmFqLgriqLMQlTcFUVRZiEq7oqiKLMQFXdFUZRZiIq7oijKLETFXVEUZRai4l6ADAxHuf/VZrRcs6Io6VBxL0Ae33WSz973Gofa+vPdFEVRZigq7gXIYDgGwHAklueWKIoyU1FxL0AiUUvUIzEVd0VRUqPiXoBEYpbXrtquKEo6VNwLEI3cFUUZDxX3AsSN3DVbRlGUNKi4FyDhqCXqkaiKu6IoqVFxL0AcWyaqkbuiKGlQcS9AHFsmGlNxVxQlNSruBYgzkKririhKOlTcCxDHa1dxVxQlHSruBUhYxV1RlHFQcS9AomrLKIoyDiruBUjYFvWIiruiKGlQcS9AnFRIncSkKEo6VNwLkIhOYlIUZRxU3AsQN89dI3dFUdKg4l6AaJ67oijjoeJegGgqpKIo46HiXoC4tWVU3BVFSYOKewGitWUURRkPFfcCRMsPKIoyHiruBYg7oKrZMoqipEHFvQDRAVVFUcZDxb0AiarnrijKOKi4FyBhd4FsFXdFUVKTsbiLiFdEtojIQ/b/1SLyqIjstW+r4vb9gojsE5HdIvLmbDT8dMZdIFvFXVGUNEwkcv8ksDPu/88DjxtjVgCP2/8jIquBW4CzgBuA74mId3qaq8BInrtG7oqipCMjcReRBcBbgf+K2/xO4C77/l3AjXHb7zHGDBljDgL7gPXT0loFGBlQ1aqQiqKkI9PI/ZvA54BY3LYGY8xxAPu23t4+HzgSt1+zvU2ZJpyBVK0KqShKOsYVdxF5G3DKGPNqhseUFNuSVEhEPiIim0RkU0tLS4aHVmAkz10jd0VR0pFJ5H4Z8A4ROQTcA1wtIj8GTorIXAD79pS9fzOwMO75C4Bjow9qjPm+MWadMWZdXV3dFLpw+uHYMo7IK4qijGZccTfGfMEYs8AYsxhroPQJY8wHgAeB2+zdbgN+Zd9/ELhFRIIisgRYAbwy7S0/jRkpHJbnhiiKMmPxTeG5XwPuFZHbgSbg3QDGmO0ici+wA4gAdxhjolNuqeIyUjhM1V1RlNRMSNyNMU8BT9n324Br0ux3J3DnFNumpCGiC2QrijIOOkO1wDDGuNkyOolJUZR0qLgXGOG49EeN3BVFSYeKe4ERXyxMUyEVRUmHinuBEY4bRNVJTIqipEPFvcCIF3SN3BVFSYeKe4ERiUtuV89dUZR0qLgXGPGCrot1KIqSDhX3AiPellFxVxQlHSruBUb8gKqKu6Io6VBxLzA0clcUJRNU3AuMcNyAalSzZRRFSYOKe4ER1QFVRVEyQMW9wHBquAe8HhV3RVHSouJeYDi1ZYJ+FXdFUdKj4l5gOAOqQZ9XxV1RlLSouBcYji0T0shdUZQxUHEvMEYid49myyiKkhYV9wLDidyDPq9WhVQUJS0q7gWGDqgqipIJKu4FhiPoIZ9XbRlFUdKi4l5gODNUNXJXFGUsVNwLDKfkb9Cn4q4oSnpU3AsMZ7EOzXNXFGUsVNwLDCdy1zx3RVHGQsW9wNAZqoqiZIKKe4ERdvPcdRKToijpUXEvMCKj8tyNCryiKClQcS8wHM894PUCoM6MoiipUHEvMCLRGH6v4POK9X/cmqqKoigOKu4FRiRm8HoEj1jirtquKEoqVNwLjHA0ht/jwefRyF1RlPSouBcYkajB5xW8Ho3cFUVJj4p7gRGJGXxejyvuGrkripIKFfcCIxKN4fOMRO6a664oSirGFXcRCYnIKyLymohsF5Gv2NurReRREdlr31bFPecLIrJPRHaLyJuz2YHTDStyjxN3zYVUFCUFmUTuQ8DVxphzgfOAG0TkYuDzwOPGmBXA4/b/iMhq4BbgLOAG4Hsi4s1C209LnAFVFXdFUcZiXHE3Fr32v377zwDvBO6yt98F3GjffydwjzFmyBhzENgHrJ/ORueLR3ec5ETXYF7bEHUid1FxVxQlPRl57iLiFZGtwCngUWPMy0CDMeY4gH1bb+8+HzgS9/Rme9voY35ERDaJyKaWlpYpdCE3GGP42I9f5ScvH85rO8JRg9fjcScxqbgripKKjMTdGBM1xpwHLADWi8jZY+wuqQ6R4pjfN8asM8asq6ury6ix+SQSM0Rihr6haJ7bYc1Q9WjkrijKGEwoW8YY0wk8heWlnxSRuQD27Sl7t2ZgYdzTFgDHptrQfOMsbzcYybO4Rw0+j8RNYlJxVxQlmUyyZepEpNK+XwRcC+wCHgRus3e7DfiVff9B4BYRCYrIEmAF8Mo0tzvnhCOWiA6G8x+5+7wePDqgqijKGPgy2GcucJed8eIB7jXGPCQiLwL3isjtQBPwbgBjzHYRuRfYAUSAO4wx+VXEaWDYidzzLe5RQ8A3Un5AxV1RlFSMK+7GmNeBtSm2twHXpHnOncCdU27dDMK1ZcL5nREajhmK4yN3ncSkKEoKdIZqhoRnTOQewx/nuWvkrihKKlTcM2SmiLvmuSuKkgkq7hkybA+oDuTblonG8OkMVUVRxkHFPUOcyH0o37aM1pZRFCUDVNwzZKbYMlaeu0buiqKMjYp7hripkJH82zJ+r+DzWB+diruiKKlQcc+QcNT23IdnxoCqre06Q1VRlJSouGdIJK78gMljbrkzoOpE7jHNc1cUJQUq7hnieO7GjFg0+SASM/ZKTCP/K4qijEbFPUOGoyMiOjicR3GPOmuo2pG7iruiKClQcc+QcNxAaj4rQzolf51JTBq5K4qSChX3DAnHWTH5SoeMxQwxA16P4LUX69DIXVGUVKi4Z0iiuOfHlgnHrNf1ez0auSuKMiYq7hkS77kP5Clyj9htsAZUtSqkoijpUXHPkJlgyzhRujWgaot7HjN3FEWZuRS0uLf3DfPAlqOc6BrM+mslDKjmLXK32hAfuastoyhKKgpa3I+09/Opn21l+7GurL/WTPDcRyL3EXHXSUyKoqSioMU96LeaP5SDei8Jee55itydE4zf49EFshVFGZOCFveQzwvAUA7yzmeC5x6Ni9w9kt1UyMd2nOSfHtmdlWMripJ9Clrc3cg9BzZJOBrD1tM8Ru6WkHvjltnLVuT+u+0n+MnLh7NybEVRsk9hi7sduedCbMPRGKUBaz3xfJX9jcTluTsLZGcrch8IR/OW8qkoytQpcHHPoeceMZSGLHHPV9nf+Dx35zZbkfvgcJTBcExnwCpKgaLiniHhaIygz0PA58lbbRlHyP12SUivR7I2icnpYz7r6CiKMnkKWtx9XitrJFcDqn6vh5DPkxOPPxVOnruTBun1CNFolmwZ++ok34uTKIoyOQpa3MGK3nM1oOr3eigKePMmeM6Aqs8bJ+5ZitwH7PdUfXdFKUwKX9z93pxYB8NRg9/nIZSj10tF/IAq2OKeJU98KKyRu6JMlaFIlB88dzAhlTpXFL645yhyj0RjBLxCyOfNY/mB5AHVbIm7E7Fr5K4ok+eFfW383UM72HSoI+evXfDiHvJ7czag6vd6CPk9+S8/YK/C5JEciLtG7ooyaTr6hwHotG9zScGLe9DnycmA6nDU2OLuzWPJX7twmDf7kbtzddKvkbuiTJqugTAAnfZtLpkV4p6LSDocibniPpSvGapuKqQl7p4siXssZtz3dFAjd0WZNJ394YTbXOLL+StOM0GfN2epkAGf4PXk0ZZxS/567NvsZMvE21zquSvK5BmJ3NWWmTBBvyennntRXrNlRmrLgBW5Z2OGavyAcb9G7ooyaRxx78pD5F744u7z5ijPPc5zz3P5AScV0ueRrJQHiI/W85UZpCizgU53QFVtmQljRe65GFC1InfL489vnrszoOqR7ETu8eKu2TKKMnk6Z7ItIyILReRJEdkpIttF5JP29moReVRE9tq3VXHP+YKI7BOR3SLy5mx2IGcDqnaee9DvyVtVSGeGqt/x3L1ZitzjBF2zZRRl8jh2TNdAJOevnYktEwE+a4w5E7gYuENEVgOfBx43xqwAHrf/x37sFuAs4AbgeyLizUbjwRlQzV22TJHfy3AkP9USo6Mid2+WIvf4KyGN3BVl8ox47jMwcjfGHDfGbLbv9wA7gfnAO4G77N3uAm60778TuMcYM2SMOQjsA9ZPc7tdQjmyZcJx5QcgP9US4xfrcG6zkQo5MJz/VacUpdAxxsTZMjN8QFVEFgNrgZeBBmPMcbBOAEC9vdt84Ejc05rtbaOP9RER2SQim1paWibRdItcRO7GGNdzD9llhvORDjl6QDVr4q7ZMooyZXqHIkRjhvKQj/7haE6C0HgyFncRKQXuBz5ljOkea9cU25IUyBjzfWPMOmPMurq6ukybkUTQ52E4EsNkqToixNVR98hI5J6HiDYSs5b6y3bk7vQt4PVonruiTBLHkllcW5Lwf67ISNxFxI8l7D8xxvzC3nxSRObaj88FTtnbm4GFcU9fABybnuYm466jmsXo3ano5vdZJX8hP5N7IjHjDqaCNZkpG5OYnL5VlfjVllGUSeKkPzbW2OKe43TITLJlBPgBsNMY889xDz0I3Gbfvw34Vdz2W0QkKCJLgBXAK9PX5ERC9jqq2cx1D0dG7JBcrts6mkg05kbtkP1JTFXFAbVlFGWSOJF6Y3UxkHvfPZM898uAW4E3RGSrve3/AF8D7hWR24Em4N0AxpjtInIvsAMr0+YOY0zWFGIkco8C/qy8xrAduQe8QsifP889HDVupgxkbxKTI+7VJYG8TL5QlNmAK+41trjn+Lc0rrgbY54jtY8OcE2a59wJ3DmFdmWME0nnxJaxUyGBvBQPi8Ri7mAqZHESk50tU1US4ETX4LQfX1FOBxwxdzz3XJf9nQXlB+Ij9+wQL+7OgGpePPeocRfqgOyWHwh4PZQGfGrLKMokcWalOrbMjBxQnckEc5CaGD+gOpItk4dUyFiiuHs94pYkmE4Gw1FCfnu9WB1QVZRJ0TUQJuDzUFsaxOuRnNsyBS/ujthmN3K3ouNEzz0/A6q+OFvG6xGyMVF2MBylKODN68IkilLodPWHqSzy4/EIFUX+nNeXKXhxd22ZXETucZ57XmaoxhIHVLMVuQ+Eo4T8XooDVqmFbK32pCizmc7+MBVFVpJHZZFfI/eJEvTndkDVeb181FyJRGMJee5ej5AFbbcid7/XPZFp9K4oE6drIExlsSXuFcV+9dwnSi4GVIfj8txDOZg0lY5ozCTkuVuFw7IRuccI+b2EAvk7kSlKodM5EKaiKABYkbuK+wTJ5YBqwCcEvB48kh/P3VowJE7cvUI0G5H7sD2gmserFEUpdLr6h0cid7VlJk5uBlRHbBkRq75MvmrLJAyoirhlgKeTwYhlyxTnsdSCohQ6VuRue+7FAc1znygjtkxuPHcgb1kk4WhyKmR2Sv5a2TLquSvK5BiOxOgfjlJZNBK5dw9GcpqcUPji7s9+bZnhUaV2Qzla/Wk00RTZMtkq+Rvyed2rov7h3K8ioyiFjOOvO7aMc9udQ9+98MXdl/2883DEqS1ji3sgT7ZMNIYvoSqkZKUq5GA4SigwYstoZUhFmRhddk57eVGiuOeyeFjBi7vf68HrkRyV/LWi5pAvP+I+ekDVk7V67jErFdLNlsnPmrGKUqiMRO5Otox1m0vfveDFHazoPRcDqk7UHPLnx5ZxVoNy8GWh5K8xhoFRee5qyyjKxHAyY1zPXSP3yWGJe/Y990DcgGo+Ive+oQilwZFCnl6PYAzTWjwsHDVEY8atLQNqyyjKRHHEPX6GKuR2wY5ZIu7e3JQfsG2ZIr83K+UHdhzrZu3f/p79Lb0pH+8dilAaihN3sdoznb6706+QzlBVlEmTPKCqtsykCPo9Wa314gyoJqRCZmFiz/++dJiO/jC7jvckPWaMscQ9PnK3/ffp9N0H7X45hcNAF8lWlInSORBGBMpClriX20GZ2jITJJSjyN3JMQ9mwXMfDEd56DVrqdnW3qGkx/uHoxhDorjL9Iu7E6WHfF68HiHo00WyFWWidPUPUx7yu+VCfF4PZSFfTmepzgpxD/qzO6A6HDUE7NmpYNky0/16j2w/Qc+QNXCZStx77ccSbBlPFmwZ+6Tl+O1FAa8bzSuKkhldcbNTHSpzXDxsdoh7lgdUw9FYQgqiNaA6va9336Zm5lcWUVsaSCnuPYO2uAdTiHt0+iN3x28v8nun1ZY53jXA3/56hw7SKrOazriKkA6VRQE61HOfGEGfN/vi7ht5q4oDXvqHp28q8dHOAZ7f38q7LlhAbWmQlp7kL0DfULK4+7IQuTtjCc7C49O9GtM3frebHz5/kEd3nJy2Y06WnsFwzut9FBLP7GnhSHt/vptRkHT0p47c1ZaZIFbeeXZXYorPL59fWUTMWFHodHD/q80YA++6YAF1ZcGxbZk4cfd4Rjx3Y8y05KM7A9Pxkft0vbd7T/bwwNajAPxu+4lpOWamRKKxJCvti7/azh/ftSmn7SgUjDF87Mev8r2n9ue7KQVJZ/8wVXaGjENVsUbuEyYXkXsgTtwba6zVzA+1Tk9U89Drx1i/pJqF1cXUlqYWd9eWCaWI3GOGx3aeYt3fP5axpzcUiXLnb3aw/VhXwvb4bBmYXlvmm4/tpdjv5Yaz5vDUrlM5tWa++ttdvOc/XkrYtvdUD7tPJmcmKdA9GKFvOMqxzukJYE43OvvDVI2yZaqK/XT0qbhPiFzMUI333BfXWquZH2rrm/Kxj7T3s+dkL9evbgBwPXczympxIvey4MgXxhOXLbO/pZf+4WhGVxPGGD5//xv857MH+fmrzQmPxWfLwPTZMjuOdfObN47zocuWcMv6hfQNR3lhf+uUj5spmw63s+N4d8L7eqJrkJ7BCN2Dua2zXQic7B4Epu/q9HQiGjN0D4apGBW5VxYH6B6MEMnGIgwpmB3i7vdkPRUy3pZpKAsR8ns41Dp1cX9y9ykArjqjHoDa0iCD4Rh9o6LlvhTZMr64PHcnYu/oG1+ovvnYXn655SgBr4c9oyLXpGyZacrp/86TeykL+fiTy5dy6bJayoI+frctN9ZMLGbYd6qX4UiMNjtyGopEae217h/tUAEbzYkuR9wH89ySwqNrIIwxJEXu1SUB9/FcMDvEPcu2zHAk0XP3eITG6hIOtU3dlnli1ykW1xSztNayempLgwC09iRaM07kXhL0jrTDjtwj8eI+jqf3xK6T/Ovje3nXBQt4x3nz2H0icTasG7n7py9yH4pEeWLXKf5w7Xwqiv0EfB6uOqOex3aeykkUc7x70LWWHJvhVPfI+6vinswJO3LvGYy43z0lM5zf4GjP3cmeyZXvPjvEPesDqonZMmBZM1O1ZfqHI7ywv42rz2hwc+hry2xxH+W79wxGCHg9BH0j4u4UMouZEXFvH8fTe/lAOwGfh6/etIYz5pTR2jtEW9xrDY5KhSwOJEfuxhjuf7U54/d8S1Mng+EYG1bUudtuOHsO7X3DbDzUkdExpsLeuKsTR9zjveSj6isncTIuYj+h0fuEcDKwRqdCOmLfkaOMmdkh7nbkPtqnni6sAVVJ2La4poSmtv4ppUO+sK+N4UiMq21LBizPHZLFvXconGDJADgXE5GocRcBGG/AxhnoCfg8rGwoA2DPyZHofTAcxSO4YwypVp3a3NTJZ+97LWNb5fl9rXgELlpa7W5708o6Aj7PpFIijTH85zMH2Hcqs8HQfadG+nes0xIqJzIFFfdUnOwZeX/Ud58YjjWaKlvGelwj94xxFuwYztIl/mjPHayMmeFobEpf/Md3naIk4GX9khHRq7NtmZbexC9A72BiXRkAb6rIfZxLvs6BYbe29BlzLHHffaLbfXxg2Cr3Gz8bd3Tk7kTCTRnmQD+/r5VzFlRSHhqJZEqCPlY1lKUtkjYWO4/3cOdvd/LD5w9ltP/ek71UlwQI+T1uxO54yfVlQbVlUnCia8j9vqnvPjGc+jFJ4l6itsyEyfY6qsOj8txhJGPm8CR9d2MMT+46xeUrrAjWobokgEgqzz1KSZK4W7cJnnsGkbtTW7quLEhlsZ/dcZH7QDjqDqaCJe6RmHHr6wDstSPhTCa49AyGea25iw3La5MeaygPuVkZE+FBuwbP5sOZWTr7WnpZUV/KvMoijtkn4xNdg5SFfKxsKKNZI/ckTnYPsmZ+BaC2zERxbJkKtWWmTijL66iGI8mR+2In132Svvuek72c6B5MsGTAKjBUVZxcgqB3KExZmsg9GjPuzLf2cb44XQNht7a0iLCyoSwhY2YgHE3w9R2hj891d2yOIx2pxb2lZ4gW++T08oF2ojHDpctrkvabUxFMsEfSsbmpwx1LMMbwa1vcd5/sGTfzwBjD3pM9rGgoZV5FkWvLHOscYG5FiPmVRRq5p+BE9yCLqoupLQ2oLTNBOvqH8XrErQTpUBzwEvB6NHKfCNleR3V0njvAnPIQQd/k0yFfa+4E4ILFVUmPpaovM7qWO4xMYopEY26u9njT6Tv7E2terGooY8+JHne8YigcS4zcUyzY4Yp7e+of/R13b+Zt336WY50DPLevlaDPw/mLkvs5pzxEZ394zM9tz8ke3vVvL3D7XRuJxQybmzo42jnAzecvwBjYeqRzzP629AzRPRhhRX0Z8ypDri1zonuQuRVFzK8qorV3SGvdxBGOxmjtHaKhIsScipDaMhOko98KoBxr00FEqCrJ3USm2SHuTuSeJVsmlefu8QiNNcWTTofcfrSLkoCXJfYVQDzWLNXxPXcnFbLTzquF8bNlOgeG3YUDAFbOKaNnKMIx+wfsLLHn4C7YYUfufUMRjnYOEPJ7ON41kGDXgJX2uLWpk5PdQ3zoRxt5ek8L65dUu1dX8TSUh4CxL/u//vAuPCJsaerkJy8f5sGtxwj6PHzuhlV4BF4dx5pxLKTlti3T0jvEcCTG8a5BN3IHZsVMzCd3n+LhN45P+TjWJDpoKA8yp7xIbZkJ0tk/nJQp42CVIFBbJmNGPPdsRe7JnjtYg6qTjdy3Hetm9bxytz5MPKlKEPQORZMjd/tqwhH08pBvzKhgMBxlMBxLKGjkDKruOWFZMwPDUUL+xCJpMGLLOAOgly2rterrdCb+8Lcd7WY4GuO2SxrZ39LLwdY+Ll2W7LcDzKmwxT2NNfPi/jYe33WKz1y/kg3La/n673bz69ePc/UZ9TSUhzhzbrnru/cNRXjff76UNOvVGfxdUW/ZMsZYdlJr7xBzKkLMs8V9NmTMfOvxvXzhl28knXAniiPmc8pDCVc7SmZ09IWTBlMdrOJhMyRyF5EfisgpEdkWt61aRB4Vkb32bVXcY18QkX0isltE3pythseT/QHVGAFfsggvqS3hcHv/hNcwjcYMO451c7Y9YDWa2tJgigHVcNrI3RH3JbUl9A1H01oMjj8dL+4r6+2MGVsEByPRhCg7NGqpPceSudIeKxjtu29pssT2jquW8/WbzyHk93DtmYnjCg5z7Mg91aBqLGb4h4d3MrcixIcvW8KdN51NOBqjvW+Yd5w7D4ALGqvY0tRBJBrj7pebeGF/W1J65r6WXspDPurKgq6Qb23qxBiYWxFiQZUt7rPAdz/SPkBnf5iXD7RP6TjO59FQbtky3YMRd4a0Mj5Wud/U4l5dEhj36nq6yCRy/2/ghlHbPg88boxZATxu/4+IrAZuAc6yn/M9EUm+Hp9mHAGaqG/ak2FNkVS2DEBjTbF1iT/BjI+Drb0MhKOcPS+NuJcF6BuOulZIOBpjMBxLEnfHc3e+LIvtWa7pyoq6K7LHXTJWFPuZUx5KiNxT2TKDceLu8wiX29kvozNmtjR1Mr+yiPryEDdfsIBtX34zK+x8+tE0VKS3ZX6/4ySvN3fx2etXEfJ7aawp4fNvOYPFNcVuqYYLGqvoG47y+tEuvv/sAQC2H+tOOM7ek70sry9FRJhbab3eq/YJaE5FEXMqQnik8CP3geGoe7X38LapWTNu5F4RYu44V1dKMmPZMpXFgZyV/R1X3I0xzwCjQ4F3AnfZ9+8Cbozbfo8xZsgYcxDYB6yfnqamZzKR+5O7TnHB3z2WsgLjaFJly8BIxszhCVozbxy1KjGOFbnDyESmVLXcYWSxDseKcdqTLjJwZ84VJUYVK+eUscsW98FwYuReHLBe07Fl9p7qZUltCQuqivB6JCly39zUwfmNI4OnvhTvm0NZ0EdxwJtSOH637Tg1JQFuWjvf3fahy5bw1F9e5bbPGaT9mwe20dIzxFnzytlxrDthYtm+U72ssK9O5lVYUbpj5cyrCOH3emgoDxW8uDfbn0NxwMsj209OaXLdie4h/F6hujjAXPs9G22/Kenp6B9OqivjUFXst8fIsjPhMp7Jeu4NxpjjAPatc909HzgSt1+zvS0JEfmIiGwSkU0tLS2TbIaFk7o3kVTIzU0dDEdjGQ0WpfPcnUj54ATTIbcd7Sbo87CsLnkwFeInMlninqrcL4yIe5sbuVu59+lSrToHkiN3gDXzy9lzsofmjn4Gw7HEyD1g9duxZfafsiJhn9fDvMpQQsbM8a4BjncNsnZh5VjddxER5qTIdY/FDM/sbeWKlXVuH1OxoKqI+rIg2491c/6iSm67dDED4SgH7ZNta+8QbX3DLK8vtfvipbok4FpQjuc/G9Ihm+32/3/rFtLaOzTuQPNYnOoepL4shMcjbuSu6ZCZ4YxrpbNlqooDdtXI7Ntc0z2gmuqXmPIUZYz5vjFmnTFmXV1dXapdMsZZNWgiA6oHWiwB6M4gT3o4RfkBgLnlVnVI51iZsu1oF2fOLU8b1TqRu5Mr3jc8TuTenxi5pxP3VJ47wPsvakQEvvfU/qRJTLWlQfxe4endLQxFohxq62OFLZYLq4oTIvctTZ0ACZH7eMypCCWdYLcd66K9b5g3rRz7eyEirLNTST9x9XLX5nJq1L+wvw1ITDedWxFyFxp3VqafX1U04yP3zv5hvvG7Xay/8zFeOZjsqTufwx9d0kjA55mSNXOie5CGcus7mElGkzJCuqJhDrksQTBZcT8pInMB7NtT9vZmYGHcfguAY5NvXmaEJpEK6WR9jHcGjdiXt6kid49HWFZX6qbbZULMHUwtT7tPbVlifZneFOunQlzk3juMzyPMtwcH031xulJ47gDzKot4z4ULuW/TEXqHIgm2TGVxgNs3LOX+zc08sOUoMQPL4sU9LnLffLiDoM/D6rnp+zYaK3JPtMae3t2CCFy+InWWTTwfuLiRD166mKtW1bOioZSA1+P67s/saaGiyM+5CyoT+gojUTtYkfuJrsFpWzYxnl0nujMe20nHb14/zuVff5LvPbWf9r5hfrG5OWmfI+39BH0eltSWcMWKWh7ZdoJozBCJZlZz6Zdbmtl4yDppnOgedN+fkN9LTUnATZXNVS1yh1zYF9PJSF2ZNLZMDksQTFbcHwRus+/fBvwqbvstIhIUkSXACuCVqTVxfFzPPcMB1VjMuDNLx/vhRezFp0dXhXRYUV/K/gmIe1N7Pz1DkbSDqQA1JU7ZX+sL0JOiljuAV0Yi94oivxsVtKep6d45YM2cG32SAPj4lcsBK5MnPhUSrKi4vizIlx7cDuB62AurrQlAzsDv5qYO1syvSCinMB4NFZYtE59x9PSeFtbMr6DGvoIZi0uX1fLld5yFiOD3ejhjbhnbjnZhjOGZPS1sWF6bYO04ee1z48W9qohIzCTZQ1MVlqf3tPDWbz3Hx3+yedLH6hoI81cPvEFjbTEPf/JyrlvdwJO7TyUdr7ljgAVVRYgIN5w9l2Ndgyz7P79l+V89zKd+tnXc17nzN7v41D1bGYpEOdk16Ebs4FxdDXCwtY9LvvYE//505kvvHWnv5+UDbQnbvvvkPt753ed5avepNM+yrsJv/cHL/MV9r2f8WjOBkYqQ6VIhA/Z+2R9UzSQV8qfAi8AqEWkWkduBrwHXiche4Dr7f4wx24F7gR3A74A7jDFZn/o30QHVY10D7qIU40XuTjGyVJE7WJNjjnYOZFzzetuxsQdTAQI+DxVF/qTIPbn8gCVa/cNRKor8+L0eK9c9neeeZuYcjETvQILnDtYVw+ffcgaD4RgisNQeK1hYbXn8zR39DEWibDvWPSFLBqzIPRIz7rhBV3+YzU0d41oy6ThrXjnbj3Wz60QPp3qGko7jiPqc8sTIHeCLv9rGP/9+N1/61TZu+OYznPuV3/PbcSYFtfQM8e3H9yZF/btOdHPHTzZTHPDy7N5WHt+ZXsjG4t+e2k/XQJiv/eE5nDm3nKvOqOdk91BSVtCRjn7383j7uXP5wlvO4NPXruS61Q38auuxMQu0xWKGjv5hjnYO8J/PHKBvOJog7nMrQhxo7eP2uzbS0jPEfz17gOEMf2v/+MhuPvijjW62lTGGu19u4rUjnXzwRxu59Qcvp0wA+Mqvd/Ds3lYe2Ho0o6SHmUK6cS2HajcAmwGRuzHmvcaYucYYvzFmgTHmB8aYNmPMNcaYFfZte9z+dxpjlhljVhljHs5u8y3cAdVRX7gj7f0pvcL9cR75eJG7MyEklecOsNyOYjON3t842oXfK2653XTElyDoTRe5x0Wk5baPPlYebedAOKmYUTwfv3I55SEfi2yRiOfG8+ZzQWMVy+tKXdvGyRE/0tHPs3taGY7EOH9R5Zj9Gk3DqFz35/e3EjNMQdwr6BoI89NXmgC4fGWitePYMnPtW4C1i6p408o6dh7v4dtP7uPeTc3UlQVZWF3MJ+/ZMmaE+YPnDvJ/H93j5veDJfi3//cmSoJefvvnl7OsroQ7f7uT4UiMcDTGr187xqkMUguPdg7ww+cPctN5891g4MpV1vvy5K7ENh1pH3A/j6DPy5++aRmfvHYF//CHawj4PPzguYNpX6drIEw0ZvAIfOuJfUDiyW9uRRGH2/ppauvnjquW0do7zCMZLnC+/VgXA+GoO05wqK2fo50DfPFtq/ni21bz4v42/uXRPQnP+dnGJu5+uYk/WDOHaMzwm9enPus2V2Tsuc9gW2ZG4fcKIsm2zJ/8zyb++oE3kvY/YEcxIiOZKOkIZxC5Q2LN8CNjTGzadbyH5fVl41oXcyuKXJ+zz12FKXWeO4wMklaVpF9hvat/pGhYKuZVFvHq31zHW9bMTXrM4xF+9KEL+fEfX+RuW1hlnQR2nejhSw9uZ1ldiZuDnilzRuW6P727hbKQj/MyzLgZzVnzLL//ZxuPsLKh1E3lc5iXwpapKPJz14fX8/znr2b3372FN758Pf97+0Xc/ScXs6K+jI/++FU2HUoexIwvYhZf4+aeV5o41jXAD267kIXVxfz121ZzsLWPv/rlG7zlX5/lz366xbW4xuL//n43AJ998yp3W31ZiHMWVPBE3AmnezBM10DY/TziqS0NcvP587n/1eaERVnica6abr240Y3I4yN356Tx9zeezWevW8XC6iL+96XD47Z/MC5z6andVkbcs3ut26vPqOfDG5bwngsXcs/GJne+xNYjnfzNA9u5fEUt337v+Zwxp4wHth4d97VyjTGGJ3efSrqCSTWXJJ6ykA+PzBBbphAQEUKjltpr6Rli14keN0UsngMtfZSFfNSXBcfNlglH0g+ogjWRye8Vd1D1aOcAV/3TU/zw+dSR0oHW3rQpkPEsqS3hQEsvxhj3BFQSGDVDNYW4VxePFbkPp/UCHdL1E6A85E/40deVBQn6PHzniX0c6xrg6zefk1BRMhOcCPFEtzWg+eTuU2xYXjtmfvxYnDm3HK9HGIrEuGJFcvR/9vxy3nfRoqRqnA4Bn8d97YoiP/9z+3qqigNuRBvPliOdbpbNa81d7vZXDrWzqqHMjbavWlXPlavquO/VZiLRGNeeWc8j20/QNEZdol9uaeaXW47yocsWu7aRw1Wr6tl6pNMV62Z7UHthiisugNs3LGEoEuPHLzWlfNz5vly7usF9X+IHnN970SLu/uOLuGX9Ijwe4f0XNfLKwfak9XdHs+dkDzEDIb+Hp/ZYJ6Nn97aysLqIxhqrrZ+4ejkiwrce30tr7xAf+/Gr1JUF+dYta/F6hBvXzmdLUyeHp2Ex+ulky5FOPvSjjdz36pGE7R19wxT5vSlrKYH1m60qDoy77sJ0MCvEHZKX2nvJHsRpSyF0B1p7WVpXSnnIP27k7nruaSJtv9fKUHAi96d2nyISM/z0laakQa/BcJTmjgGW1pWO259ldSX0DEZo6R2idyhCScCblPOdNnJPO4kpnJQGORVEhAVVRfQPR7n14kbWLa4e/0mjqC0N4BHLlnlhfyuneoZ42znzJt2mkN/rnjyvSGHtBH1evnrTmoST1NjtC3LugsqU9VUeeu04AZ+Hy1fUsvWIZctEojE2H+5wUzQd/vFd5/Iv7zmXRz59BX9/4xo8IvzohdQBwK+2HuWz977GJUtr+PS1K5Mev/qMeoyxBmxhJA3SibBHs7y+jKtW1fE/Lx5KOYu7vc86SVSXBPjKO87iz69ZQWPciaI85OfSuHr8775gAQGvhx8+d5C9J3vY3NSRMg1553FrXOCWCxdxoKWPAy29vLi/jQ3L69xxn7kVRdx6cSP3b27mQz/aSHvfMP9x6wVU2YtJv90uNfGrrVlPupsQL9pptk/vTpyj02GvdDYWuaovM3vE3edJiNxftMW9vW84ySI50NLHstoSykI+eoam5rmDZc04S745l5/7W/rYHOfDgpUpYwwZRe7OCeBAS59VETKUnOGSKnKvKvanrTrXNc3iDrCyoYz5lUV87oYzJvV8n9dDXVmQE12D/PzVZspDPq5JU4smU9bMr6TIn7jC1VRIlYsfjRkeev0YV66sY8PyWo60D9DWa10t9g1HuXDUia6uLMhNaxcQ9HmZUxHi7efO496NR5Lq0T+56xSfufc1LlxczX/dti5lBLhmfgW1pUGesH13x9JIZcs4fOiyJbT1Dbvfz3icAKimxBpn+Mx1K1MWtHOoKQ3y1nPmcs/GI1z3L8/wh997IaWnv/N4DyUBL7de0gjANx/bS+9QhCtGpbh+7MplhPxe3jjaxdduXpOQbDC/soj1S6p5YOvRGZUW6QSPL+xvSyjUZpUeGPvquKo44KZMZpNZJO6Jtozz5kdjxh3BBmtR6uNdgyytK6G8aPzIfTzPHazIqKm9n57BMC/sa+WmtfMpDni5d2NiPrLj9S+pHV/cnVzy/S299A5Hkvx2SB+5D4SjSUvjhaMxeoYiab3AyfL1d53Dr/9sQ8r0ykyZUx5iX0svj2w/wdvPnZf2kjZT/vLNq/jJn1w05eO47asI0TsUSciI2nionVM9Q7z93Hmca48PvN7c5eaKjxb30dy+YQl9w1HueSXRKvnmY3torCnmhx+80C39MBqPR7j+rAYe3XGS1t4hmjsGKA36xvxsL1lWQ1nIxxO7ktesbbfLSzs52JnwhT84g3/4wzV8531rWVpb4kay8ew43s2qOWUsqyulsaaYB187hkdIqhJaWxrk6zefw1fecRY3rV2QdJwbz5vPgZY+t0RGNnhgy1Fu/cHLGc11GI7E2HSog4XVRfQORdzJe2AlLYz3PlYWpx8Xm05mjbiH/B730vBk9yAHWvo4Z4EVAcSnUjmzSZfWlVIW8o/vudviPpYHvLy+lJiBezc10zcc5Q/WzOWta+by0OvHEqrpOVk6mYh7/OzX3sFIUhokjFSFhETPHZJH451+jjWgOhnKQ36qS8aOVMajoTzElqZOBsMx3nVB8o97osypCKVcHGTSx0sxS/PXrx2jyO/lmjPrWTO/Ao9YPuzGQ+3MryxyB27Tcfb8Ci5ZWsN/v3DI/d7uOdnDa81dvG/9opQn83g+fNkShqMx7nrhEM0d/W6Oezr8Xg9vWlnHE7takq5k2/qGKQv6JjReUl8W4r3rF/G2c+Zx2fJaNh/uSJjgZIxh5/FuzrQntF21yroaO2dBZcqMrbefO4/bLl2c8rUuXWat4vXaOAuzTIUfPX+QZ/e28vsMsoDeONrJQDjKn129Aq9H3EFisH53o2s3jaa6xK/iPhGCPq9bW8aJ2t92jpX1ES/uTr7v0jrblhnPc3cHVNP/cJzp+D96/iABr4dLl9XwngsX0jcc5TdxedIHWvqoLwu6097HwuMRltSWcqClN+UqTJAYuZfHRe5g2VGtvUNuGtlI/u3UhDgbOIN3S+tKJp0lk01Gp2uCNTB4xcpaigM+SoLWWqxbj3Sy8VAHF6ZYXSsVH79qGce7BrnnFWtQ7r5NR/B5JKFYWjqW15fy5tVzuOuFQ+w52cuCMSwZh2vPbKC1d8hdBcyhvW+Y6tLJfy/WLbaqc8ZH1se6BukZjHCGLe5vslM4M5l1PJrGmmLKQj634F4q+oYifPW3O7n+X56ecP35o50D7oB4upTRlw+0ueMVL9klla87s4HzFlbyzJ4RcR+90lkqnAU7sm0zzSJx9zAYcd78NspCPndALX5VowMtfYhYdVicAdVUb/J/PnOATYfa4zz39G/VktoSPGLNErxwSRUlQR8XNFaxtLaE+zaNjKYfbO11JwBlwrK6EvY7nnuKSM6bKlumZCRy/9zPX+eOuzfT3NHvpl6NleeeLxzxvPn8BWNGn/lidLpmOBrjaOdAwlyFcxdU8tL+Nlp6hjIeWN6wvJaLllTz7Sf20T0Y5pdbjnL1GfUZzcwF+OiVy+gejNDU3s/C6rGvFMDKkfd6JGlCVXvf8JSuvpz+xqeL7rQnWa2ea71Hly2r5U8uX8It6xdN+Pgiwlnzytk2auIWWMJ878YjXPfPT/P9Zw5wsLWPL2eQZhqPE63/0SWNbDrckTBnAaxc/fd8/yX+9qEdgKUvZ8wpo6okwOUrann9aJc7ttfZP5w2x92hsjjAcCTmFuPLFrNG3GtLg2w+3MnPNjbx4v42LlpSTUOZ9aOMz+890NrH/MoiQn4vZSEfw9FY0uSn4UiMf3h4J//86J6MPPeQ3+tO/LlypXX5KSK847x5bDrc4b7+gda+jDJlHJbWldLc0U9b3zClwWRRFhEcfXeiBeeLdf+rze6A2+amzqzZMtPBOQsqqCr2c/P5U7dkskF8uiZYS/JFYyYh9fC8RZVuZlWmA7kiwl++eRWtvUN85H820do7zLvXLRz/ic5rLqx0LYuxBlMdKosDXNBYxeOjJkC19Q1TMwVxn19ZxLyKEBvjKlE6mTKr5liRe8Dn4a/eujoprTNT1syvYOfxbvf3uL+llyu+8SSXfe0JPnf/65SF/Pz8o5fw2etX8fsdJzOeZAXwu20nWNVQxuduOIOyoC8pen/Ivvq9++Umnt3bwqZDHVy81Hrfr1hZhzHw3L5WegYjxEz6HHeHatuTz/Ys1Vkj7l98+2rOW1jJ/3//Gxxq6+fipTVUFPnxeiTBljlop0EC7urk3aNmqTZ39BMz8PLBdreo1VjiDiMzVZ3LT7B8RmOsS/j2vmE6+8MszcBvd1hWV0LMWLZSaTC1H+rzjORkw0jk/sDWYyytLaE44GXz4Q46B8aueZFPLl9Rx+a/uS4ht3omURTwUh7yuZH7YTs/PT5d0ClOVlHkZ/kETuDrFldz1ao6XjrQTm1pwJ2Bmil3XGXVBBpvxrPDtWfWs/N4d0IVzPa+oSmPm6xbXM2mQ+3uVfDOE9001hRPaaA9nrPnVzAcibkpxz/beITjXQN86e2reejPNvDbT17OusXV3L5hCWfMKePLD27PqCRIa+8QGw+18+az51Aa9HHL+oU8vO2E+/4YY82QXb+kmkXVxXz8J5sZCEddcT93QSUVRX7u23SER3dag9XjRe71drDQ1J5+nsN0MGvEfV5lET/544v44ttWs7SuhOtXz8HjEWpKArTZtowxhsOt/Sy2J1A4PvVo39358UZjxi2dmmqZvXiuPbOey1fUuv47WNFGTUmAJ3efcjNlJmbLjBwrlecOYGu7K+4VRX4cZ+OLb1/NOQsq2NzUMTJzbgZG7sCMtGPimVtR5Ebuzo9yUc2IuK9sKKXI72VdY9WYaYSp+Oz11gzUm89fMG4QMZrLltfy7Oeu4rLlNRntf/UZDQA8YQuRMca2ZTKzgtJx4eIqTnYPuZMGdx7v4cw5mVcHHQ8nPdLx3R/feZKLl9bwocuWcPb8Ctei9Hs93HnTGk50D/L1h3eNe9xHd5wkZuAtZ88B4LZLFyNYxc3AWtmrqb2fm8+fzz/84RpXKy6yr868HuEP1szh2b2t/MV9rwFW2utYXNBYhUdImWE0nUzPaXWG4PEIH96whA9vWOJuq4lbbLprIEzPUMS1UMqcyH1UxowzZbqq2M/z+6wFl8f70d2yflGSn+jxCG9aWceTu09xiX2mX1qbeVQXn1WTypYBK3L3eYy7kLXXI8yrKGL1vHKuXFXPxkPt/MfTB9zXL5+h4j7TcapXgiXuAZ/Htf3Ayqb69nvXpp0lOhZnz6/gwU9c5paymCgTec1ldSUsqi7m2b2t3HrJYroHI4SjZkq2DMT57ofb6RmMcKitj3eeN/nJaKNZUlNCScDLtqNdrF9czf6WPm69uDHlvhc0VvHhy5bwg+cOsmFFLW8+a07a4/5u2wkaa4rdheIXVBXzgYsb+Z8XD/FHlzTy0OvH8XmE61fPoaokwIcvW8Lhtj43cQHgqzet4c+vWcHhNmtsy7HK0lEe8nPOgkqe39fqntizwawS91RYBbisyN2JuBa64p4ucu+jNOjj5vMX8F+2/zbRiMrhTavq+MWWozyw9Sh+r6SdRZiKkqCPuRUhjncNpo/cxYnWR6LFX3z8UjeSP39RFRF7ZaPykG/MlY2U9MwpD7LL9pGb2vpZWFWUFKFfu7ph0sc/J67mfDYREVbPLWePPenO8X2nasusbCijLOTj7peb2HWih4ay0LSOoXg8wlnzKth2tMsdM3CuQlLxuRtW8crBdj7389c5e35FSq9/29Eunt/Xyu2XL0n4/XzymhX8cstR/v6hnRxu7+Oy5bWumH/x7auTjiMizK0oSqpjNBYbltfyb0/vp2cwnFH23GSYNbZMOmrjInf3ctoW9/I04n6orZ/GmmLesmbkjD9Zcb9iRR0esdKnGmtKJlwzxbFxUuW5gxUxjp512lAecifwrLXzvXce756RmTKFwpzyEK29Q0SiMQ6399NYk7m9NtNYXl9KU1s/4WhspPTAFFIhwbpivKCxio2HOqgrC3L/xy+d1FXMWJw9v4Idx7v5/fYTrKgvTbDFRhP0efn2e9cSicb41D1bkjLi+oYi/PlPt1BTGuBPr1iW8FhVSYBPXrOC5/a1cqR9gLeek1xIb6pctryWaMzw8oHkgnTTxWkg7iOee3LknnpA9XBbH4trS1i7sIp62z8bKxVyLKpKAm7udiaTl0bj+O7pJrV4RMa0WqpLAu4g7niTK5T0NFSEiBk41TPEkfb+lGWRC4Vl9SVEYobDbf3ub6N2ip47WMs1vnXNXH7+0UsnnRUzFmsWlDMYjvHywXauOXP8q6TFtSV85vpVbDzUwaFRRdq+8uvtHGzr45vvWZvyquXWSxpZWltiWzKTvyJLx/mNlYT8Hp6zbd9sMOvFvaY0yEA4St9QhCPtA1SXBNwRfEfc42u6h6MxjnQMsLimGI9HXL9uIqsLjeZKe3beRAZTHRxhTpd14PPIuPVinAU0prv0wOmEkw6583g3vXHjNoWIEzDsO9U7YstMMXIHuG51A999//lTtnjSEb96Wab1h5yBzx1xOfKP7TjJvZua+cRVy7kkjT/u93r47vvP51vvXZuVDLOgz8uFi6t5Yb+K+6RxFptu6x3mSHt/wqViScCqrRxvyxztsHKYncvuT167gu++7/yERaMnilNGdVWG6WrxXLS0htrSYNqoP+DzjFuFzpmKP91Fw04nnDRNZ9GJxjEsgZmOkwq8v6U3rmjYzL+qW1pnZSRVFvtZm+FM5hUNpfg84i6aDvDwthPUllrWy1icObecP0ixtsF0sWF5LXtO9ma0cMtkmPUDqjV2RNLSO0RTe79b5AmsQZrSYGIJAmdtVUdMa+0KeFPh7PkV3P+xS91aNxPhzLnlbPrra9M+/rWb11BfNnZ++AUauU8ZJ3J/xZ6FWciRe2nQx5zyEPtbeqkqDlAcSF9/fCbh9QhvP3cutaXBjMeugj4vy+tL2XF8JHLfcqSDtYuqJr1mwHRxmV1G+fn9rSkLpk2VWR+519mR+6nuQY52DrBo1DTt8qLE4mGH7DTI6Y7MLmismvSg7Fhcuqx23BS6FfWlLKsrYfXciZ9cFIvqkgABr4c37Bok0z1YmGuW1VulLaZaeiDXfONd5064vPTqeeWuLdPZP8yBlj7WTnA5yGywem45lcV+nt+XnXz30yZyf+NoF9GYSYq4ykL+hEWyD7X1UxzwuieF2YDHIzz+2Svz3YyCRkSoLw/S3DFAQ3mwICLdsVheV8ovNh+loshfEJbMVDhrXgW/2HyUlp4hd4H6tQunr2roZPF4hL+4flXCko/TyewXdzsLwFk4Y3TEZVWGHIncD7f10VhTMuNnTCq5Z055iOaOARqrCzcN0mFZfSk9QxF2He92152dray2K1PuON7NlqZOPMKkLNJs8IE0E7Gmg1lvywR8HspDPl53LqdHFVgqHxW5H27rZ0ltYV9yK9mhwY6wCt2SgZGMmVM9Q1MuPTDTccX9WDdbmjpYNad83Hr5s4FZL+4AtWVB+oej+DySdAlUHhe5R6Ixmgp8goqSPZxB1ULOlHGIr1tUMw1pkDOZimI/8yuL2Ha0i61HOmeE354LTg9xtyOT+VVFSSPk8Qt2HOscJBIzbmExRYnHCQxmg7g3lAcpsdN7C2lAdbKcNa+cJ3efomcwknEaZaFzeoh7mfXlTZW+Zq2jaq2K4qRBLtbIXUmBM+tyNnw/RMRdp/d0EPfV88rpt9cVXjuNSzDOZE4PcbczX1J5pWUhHzEDfcNR9p7KfAFr5fTj2tUN/PsHLpgxg3FTxak7P9uzZWDEdy8P+Sa0pkIhM/tHFRjJmEkVuY9Uhgzz4v42GmuK3WL6ihKP3+vhhrPTl48tNE6nyP0sux78eYsmXm+/UDk9Infblkm1FJlTGbKjL8xLB9q4dNnEF/BVlELk0mU1LKwuOi2uVOdVhDhvYSU3jFHbfbZxWkTuTsS+siF5JqdTPOz5fa30DkXYsFzFXTk9WLuoimc/d3W+m5ETRIQH7rgs383IKaeFuG9YXstjn3lTymn6jrg/vO04IqStEqcoilJInBa2jIikrb/i1ELf3NTJ6rnlp4X/qCjK7Oe0EPexKItbvk4tGUVRZgunvbiXx61feKmKu6Ios4TTwnMfi6DP4y6hd+Hi02Nyg6Ios5+sRe4icoOI7BaRfSLy+Wy9zlQREcpCPtYuqqQ4cNqf6xRFmSVkRc1ExAt8F7gOaAY2isiDxpgd2Xi9qfLZ61edFrm+iqKcPmQrVF0P7DPGHAAQkXuAdwIzUtzfd9GifDdBURRlWsmWLTMfOBL3f7O9zUVEPiIim0RkU0tLS5aaoSiKcnqSLXFPVbzBJPxjzPeNMeuMMevq6uqy1AxFUZTTk2yJezOwMO7/BcCxLL2WoiiKMopsiftGYIWILBGRAHAL8GCWXktRFEUZRVYGVI0xERH5BPAI4AV+aIzZno3XUhRFUZLJWmK3Mea3wG+zdXxFURQlPad9+QFFUZTZiIq7oijKLESMMePvle1GiLQAh6dwiFqgdZqaky9mQx9gdvRjNvQBZkc/tA9j02iMSZlLPiPEfaqIyCZjzLp8t2MqzIY+wOzox2zoA8yOfmgfJo/aMoqiKLMQFXdFUZRZyGwR9+/nuwHTwGzoA8yOfsyGPsDs6If2YZLMCs9dURRFSWS2RO6KoihKHCruiqIos5AZKe4i8kMROSUi20Zt/zN76b7tIvINe9t6Edlq/70mIjfF7f+Uvb/zeH0++yAi54nIS3ZbNonIent7jYg8KSK9IvKdUce5QETesJcr/JaIpCqnPFP6MdZnkbd+TLAPi0VkIK4f/z4T+jDRftiPnSMiL9q/lzdEJGRvf4+IvB7/O5qJfRARv4jcZbd9p4h8Ie45efttj9GPc+33+w0R+bWIlMc99gX7e7NbRN4ctz0gIt8XkT0isktEbp62RhpjZtwfcAVwPrAtbttVwGNA0P6/3r4tBnz2/bnAqbj/nwLWzaA+/B54i33/D4Cn7PslwAbgo8B3Rh3nFeASrBr5DzvPn6H9GOuzyFs/JtiHxfH7FfBn4QNeB861/6/BKuJXAzQBdfb2u4BrZmgf3gfcE/fdOgQstv/P2297jH5sBN5k3/8w8Hf2/dXAa0AQWALsB7z2Y18B/t6+7wFqp6uNMzJyN8Y8A7SP2vwx4GvGmCF7n1P2bb8xJmLvE2LUoiD5Ik0fDOCczSuwa9wbY/qMMc8Bg/E7i8hcoNwY86KxPv3/AW7MZrtHM8F+pPws8t2PifQhHfnuA0y4H9cDrxtjXrOf22aMiQJLgT3GGGf5s8eA6YsWx2GCfTBAiYj4gCJgGOjORTvHI00/VgHP2PcfZeR9fSfWSWrIGHMQ2Ie1FClYJ4F/sI8ZM8ZM20zWrFWFzAIrgctF5E4sEfwLY8xGABG5CPgh0AjcGicwAD8SkShwP9YZMp/i/yngERH5J6yz9KXj7D8fa+ETh6TlCvPEp0jTj1SfhYjMxH58ivSfxRIR2YIlJH9tjHmWwvssVgJGRB4B6rDE5RtYwnKGiCzG6sONQCDHbR7Np0jdh59jCeNxrMj908aYeEGdSb9tgG3AO4BfAe9mZMGi+cBLcfs1A/NFpNL+/+9E5EqsiP4TxpiT09GYGRm5p8EHVAEXA38J3Ot4nsaYl40xZwEXAl9wvEXg/caYNcDl9t+tuW92Ah/D+oIuBD4N/GCc/cddrjBPpO1Hms9iJvYjXR+OA4uMMWuBzwB3297pTOwDpO+HD8vqe799e5OIXGOM6bCf8zPgWSyrIzL6oDkmXR/WA1FgHpad8VkRWWo/NtN+22BF4XeIyKtAGdaVBqT/7viwVql73hhzPvAi8E/T1ZhCEvdm4BfG4hUghlWQx8UYsxPoA862/z9q3/YAdzNyKZQvbgN+Yd+/j/Hb04z14TvMlOUKx+3HqM9iJvYjZR/sS+c2+/6rWNHUSmZmHyD9Z9EMPG2MaTXG9GOtrXA+gDHm18aYi4wxlwC7gb05bvNo0vXhfcDvjDFh24Z9HlgHM/K3jTFmlzHmemPMBcBPsb47kH7Z0TagH/ilvf0+7M9oOigkcX8AuBpARFZiXUq2irWUn8/e3ojlex0SEZ+I1Nrb/cDbsC6b8skx4E32/asZ50dljDkO9IjIxfZVyh9hXfLlm5T9SPdZzNB+pOtDnYh47ftLgRXAgRnaB0j/nXoEOEdEiu3P5E3ADgAns0REqoCPA/+V0xYnk64PTcDVYlGCddW+a4b+tuPfVw/w14CTafUgcIuIBEVkCdZ36hXbRvo1cKW93zXYn9G0MF0js9P5h3XWOw6Esc56t2OJ+Y+xPsTNwNX2vrcC24Gt9vYb7e0lwKtYGQPbgX/FHqHOYx822G16DXgZuCBu/0NYAzS99v6r7e3r7D7vB76DPat4JvYj3WeR735MsA832314ze7D22dCHyb5nfqA3ZdtwDdGHWeH/XfLTO0DUIoVzW632/qX9va8/rbH6McngT3239fivx/AX9nfm93EZVlhjU09Y/flcSxLcFraqOUHFEVRZiGFZMsoiqIoGaLiriiKMgtRcVcURZmFqLgriqLMQlTcFUVRZiEq7kpOEJGoXb1vu1gVIz9j5wOP9ZzFIvK+aWzDn9vVBX8yDcd6SkSSFj0WkQ+KXdlTRD4qIn80iWNXisjH4/6fJyI/n1qLldONQqotoxQ2A8aY88Cd7HE3VpGoL43xnMVYsxTvnqY2fBwrx/jgNB1vTIwx/z7+XimpxGrr9+zjHAPeNU3NUk4TNHJXco6xppJ/BPiEPftwsYg8KyKb7T+ncNTXsIrFbRWRT4uIV0T+UUQ2ilWP/E9THd++Kthm/33K3vbvWBURHxSRT4/a/4Mi8isR+Z1db/tL9vbFkliv+y9E5MtxT/2AiLxgv07S9HcR+bKI/IV9f7mIPGZftWwWkWUiUioij9v/vyEi74zr9zK73/8Y3w4RCYnIj+z9t4jIVXF9+IXdh72S4zrtysxDI3clLxhjDti2TD1W3ffrjDGDIrICa/bfOuDzWNU/3wYgIh8BuowxF4pIEHheRH4fH4mLyAXAh4CLsAo2vSwiTxtjPioiNwBXmdRlVddj1cHpBzaKyG+A8cqvlhhjLhWRK7AqYZ49xr4/wSpZ/Uuxiql5sApL3WSM6ban078kIg/a/T477kpncdxx7rDfvzUicgbwe7HKcQCcB6wFhoDdIvJtY8yRcfqgzFJU3JV84lTL8wPfEZHzsKoArkyz//VY9VIci6ICq05HvM2yAfilMaYPQER+gVU1cMs4bXnU2AXD7OdswKpnNBY/Bau2t4iUy0gJ1wREpAyYb4z5pb3/oL3dD3zVPjnEsErDNozzmhuAb9vH2SUihxl5vx43xnTZx96BNbVdxf00RcVdyQtiFeWKYkXtXwJOAudiRbSD6Z4G/Jkx5pGxDj3JJo2uw2GwSuHGW5ehFPuM9f94bXo/Vq31C4wxYRE5lOI1Mj0WWBG7QxT9fZ/WqOeu5BwRqcOqmPcdYxU3qgCOG2NiWMXHvPauPVh1sR0eAT5mR7yIyEq7WmA8zwA3ilUNsQS4Catu+XhcJyLVIlKEtYDF81gnnHqx1rgNYlUfjOc9djs2YNlFXakObIzpBppF5EZ7/6CIFNv9PmUL+1VYkXaqfo/u3/ud/gOLsIpRKUoCemZXckWRiGzFsmAiwP8C/2w/9j3gfhF5N/AkVh14sCrlRUTkNeC/sar/LQY2i4gALYxa6s4Ys1lE/htrvVOA/zLGjGfJADxnt2k5cLcxZhOAiPwtVqXCg8CuUc/pEJEXsJaI+/A4x78V+A/7eGGslXp+AvxaRDZhVdLcZfehTUSetwdRHwa+G3ec7wH/LiJvYL2PHzTGDElu1+pWCgCtCqmc9ojIB7EWW/5EvtuiKNOF2jKKoiizEI3cFUVRZiEauSuKosxCVNwVRVFmISruiqIosxAVd0VRlFmIiruiKMos5P8BvgpsHe6ZbO4AAAAASUVORK5CYII=\n" }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "mean_title_len.compute().plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This was a very quick overview. The [dask docs](https://www.dask.org/get-started) go into much more detail as do the Hugging Face [datasets docs](https://huggingface.co/docs/datasets/). \n" ] } ], "metadata": { "colab": { "name": "scratchpad", "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "0c0752e8e9024977979562531ad5a4b7": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "408dd31921ee40bb9c9fc8995d4b8577": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_5e1150b6dfb8494195191efaeb5b7feb", "IPY_MODEL_6edda0b1149541499ab485df06711944", "IPY_MODEL_45fcaeca1a184768a624607172ab7d72" ], "layout": "IPY_MODEL_ea28bfd1711148c29b494a0194d2ffbf" } }, "45fcaeca1a184768a624607172ab7d72": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_4ccb508cf3ab4001a30cea47d2448e23", "placeholder": "​", "style": "IPY_MODEL_88809da60e3846dc886cd2545edbc5c3", "value": " 1/1 [00:00<00:00, 21.14it/s]" } }, "4ccb508cf3ab4001a30cea47d2448e23": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "5d3ee4cdf67e44878bac361e143b828a": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "5e1150b6dfb8494195191efaeb5b7feb": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_b3eaa93967914d7f8977d736611b845f", "placeholder": "​", "style": "IPY_MODEL_5d3ee4cdf67e44878bac361e143b828a", "value": "100%" } }, "6edda0b1149541499ab485df06711944": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_bff82a2eab2a4f7d87f8037e30839050", "max": 1, "min": 0, "orientation": "horizontal", "style": "IPY_MODEL_0c0752e8e9024977979562531ad5a4b7", "value": 1 } }, "88809da60e3846dc886cd2545edbc5c3": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "b3eaa93967914d7f8977d736611b845f": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "bff82a2eab2a4f7d87f8037e30839050": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "ea28bfd1711148c29b494a0194d2ffbf": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } } } } }, "nbformat": 4, "nbformat_minor": 4 }