{ "cells": [ { "cell_type": "markdown", "id": "20a04b0c-06cb-4f29-8381-a6a0d4a20ccd", "metadata": {}, "source": [ "# Wordcloud plots\n", "For text exploration, it might make sense to visualize texts as data points and interact with them." ] }, { "cell_type": "code", "execution_count": 1, "id": "8d301701-368f-4365-b555-dae6f06d8bea", "metadata": {}, "outputs": [], "source": [ "import stackview\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "a991df41-86ab-4188-af47-e6e0cf6d7b32", "metadata": {}, "source": [ "Here we reuse a list of sentences and a [UMAP](https://umap-learn.readthedocs.io/en/latest/) produced from their text-embeddings. The sentences are taken from [Haase et al. 2022](https://arxiv.org/abs/2204.07547) which is licensed [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0)." ] }, { "cell_type": "code", "execution_count": 2, "id": "bb75ed7e-aa83-4015-a74b-8dfb9405ecf1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Unnamed: 0 | \n", "sentence | \n", "UMAP0 | \n", "UMAP1 | \n", "
|---|---|---|---|---|
| 0 | \n", "0 | \n", "A Hitchhiker’s Guide through the Bio-image Ana... | \n", "-2.863276 | \n", "8.680281 | \n", "
| 1 | \n", "1 | \n", "Modern research in the life sciences is unthin... | \n", "-3.731295 | \n", "7.875060 | \n", "
| 2 | \n", "2 | \n", "In the past decade, we observed a dramatic inc... | \n", "-4.748690 | \n", "6.128065 | \n", "
| 3 | \n", "3 | \n", "As it is increasingly difficult to keep track ... | \n", "-4.183692 | \n", "6.847530 | \n", "
| 4 | \n", "4 | \n", "We give guidance on which aspects to consider ... | \n", "-4.912832 | \n", "6.691180 | \n", "