{ "cells": [ { "cell_type": "markdown", "id": "fa6fc62e-901e-4bff-9b08-2392391d9415", "metadata": {}, "source": [ "# Using n-gram in Text-Fabric (N1904-TF)" ] }, { "cell_type": "markdown", "id": "c311e7fb-1c3e-4a8f-95b2-6d836468afbd", "metadata": {}, "source": [ "## Table of content (TOC)\n", "* 1 - Introduction\n", "* 2 - Load Text-Fabric app and data\n", "* 3 - Extracting n-grams\n", " * 3.1 - Define the n-gram size\n", " * 3.2 - Iterate through the text and extract n-grams\n", "* 4 - Analyzing the n-grams\n", " * 4.1 - N-gram frequency analysis\n", " * 4.2 - POS-sequence frequency analysis\n", " * 4.3 - N-grams and ambiguity\n", " * 4.4 - Correlating preceding n-grams with final POS tagging\n", "* 5 - Required libraries\n", "* 6 - Notebook and environment details" ] }, { "cell_type": "markdown", "id": "0e808d8e-91f5-498a-b04a-dcc487277f18", "metadata": {}, "source": [ "# 1 - Introduction \n", "##### [Back to TOC](#TOC)\n", "\n", "An n-gram is a contiguous sequence of n items (typically words or characters) from a given text or speech. In the context of a text corpus, an n-gram is used to analyze patterns of word usage and co-occurrence by grouping items into chunks of size n. For example, a 1-gram (unigram) would analyze individual words, while a 2-gram (bigram) would examine pairs of consecutive words. N-grams are particularly useful in natural language processing (NLP) for tasks like text prediction, language modeling, and understanding the structure and context within a large corpus of texts. This notebook will show how to create n-grams within the Text-Fabric environment." ] }, { "cell_type": "markdown", "id": "9409098f-b094-48ab-b76a-97272b44ea00", "metadata": {}, "source": [ "# 2 - Load app and data \n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "code", "execution_count": 3, "id": "8fc72cc8-6148-49e2-9cf5-9552cca7bf8a", "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 4, "id": "a30445d0-3299-4cf2-9ce4-50d17110a6da", "metadata": {}, "outputs": [], "source": [ "# Loading the Text-Fabric code\n", "# Note: it is assumed Text-Fabric is installed in your environment\n", "from tf.fabric import Fabric\n", "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 7, "id": "1c331d20-2d5c-4af2-af50-6042be0c34cc", "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/CenterBLC/N1904/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/CenterBLC/N1904/tf/1.0.0" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " TF: TF API 12.5.5, CenterBLC/N1904/app v3, Search Reference
\n", " Data: CenterBLC - N1904 1.0.0, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots / node% coverage
book275102.93100
chapter260529.92100
verse794417.34100
sentence801117.20100
group89457.0146
clause425068.36258
wg1068686.88533
phrase690071.9095
subphrase1161781.60135
word1377791.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Nestle 1904 Greek New Testament\n", "
\n", "\n", "
\n", "
\n", "after\n", "
\n", "
str
\n", "\n", " material after the end of the word\n", "\n", "
\n", "\n", "
\n", " \n", "
int
\n", "\n", " 1 if it is an apposition container\n", "\n", "
\n", "\n", "
\n", "
\n", "articular\n", "
\n", "
int
\n", "\n", " 1 if the sentence, group, clause, phrase or wg has an article\n", "\n", "
\n", "\n", "
\n", "
\n", "before\n", "
\n", "
str
\n", "\n", " this is XML attribute before\n", "\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " book name (full name)\n", "\n", "
\n", "\n", "
\n", "
\n", "bookshort\n", "
\n", "
str
\n", "\n", " book name (abbreviated) from ref attribute in xml\n", "\n", "
\n", "\n", "
\n", "
\n", "case\n", "
\n", "
str
\n", "\n", " grammatical case\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " chapter number, from ref attribute in xml\n", "\n", "
\n", "\n", "
\n", "
\n", "clausetype\n", "
\n", "
str
\n", "\n", " clause type\n", "\n", "
\n", "\n", "
\n", "
\n", "cls\n", "
\n", "
str
\n", "\n", " this is XML attribute cls\n", "\n", "
\n", "\n", "
\n", "
\n", "cltype\n", "
\n", "
str
\n", "\n", " clause type\n", "\n", "
\n", "\n", "
\n", "
\n", "criticalsign\n", "
\n", "
str
\n", "\n", " this is XML attribute criticalsign\n", "\n", "
\n", "\n", "
\n", "
\n", "crule\n", "
\n", "
str
\n", "\n", " clause rule (from xml attribute Rule)\n", "\n", "
\n", "\n", "
\n", "
\n", "degree\n", "
\n", "
str
\n", "\n", " grammatical degree\n", "\n", "
\n", "\n", "
\n", "
\n", "discontinuous\n", "
\n", "
int
\n", "\n", " 1 if the word is out of sequence in the xml\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " domain\n", "\n", "
\n", "\n", "
\n", "
\n", "framespec\n", "
\n", "
str
\n", "\n", " this is XML attribute framespec\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " this is XML attribute function\n", "\n", "
\n", "\n", "
\n", "
\n", "gender\n", "
\n", "
str
\n", "\n", " grammatical gender\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " English gloss (BGVB)\n", "\n", "
\n", "\n", "
\n", "
\n", "id\n", "
\n", "
str
\n", "\n", " xml id\n", "\n", "
\n", "\n", "
\n", "
\n", "junction\n", "
\n", "
str
\n", "\n", " type of junction\n", "\n", "
\n", "\n", "
\n", "
\n", "lang\n", "
\n", "
str
\n", "\n", " language the text is in\n", "\n", "
\n", "\n", "
\n", "
\n", "lemma\n", "
\n", "
str
\n", "\n", " lexical lemma\n", "\n", "
\n", "\n", "
\n", "
\n", "lemmatranslit\n", "
\n", "
str
\n", "\n", " transliteration of the word lemma\n", "\n", "
\n", "\n", "
\n", "
\n", "ln\n", "
\n", "
str
\n", "\n", " ln\n", "\n", "
\n", "\n", "
\n", "
\n", "mood\n", "
\n", "
str
\n", "\n", " verbal mood\n", "\n", "
\n", "\n", "
\n", "
\n", "morph\n", "
\n", "
str
\n", "\n", " morphological code\n", "\n", "
\n", "\n", "
\n", "
\n", "nodeid\n", "
\n", "
str
\n", "\n", " node id (as in the XML source data)\n", "\n", "
\n", "\n", "
\n", "
\n", "normalized\n", "
\n", "
str
\n", "\n", " lemma normalized\n", "\n", "
\n", "\n", "
\n", "
\n", "note\n", "
\n", "
str
\n", "\n", " annotation of linguistic nature\n", "\n", "
\n", "\n", "
\n", "
\n", "num\n", "
\n", "
int
\n", "\n", " generated number (not in xml): book: (Matthew=1, Mark=2, ..., Revelation=27); sentence: numbered per chapter; word: numbered per verse.\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
str
\n", "\n", " grammatical number\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "person\n", "
\n", "
str
\n", "\n", " grammatical person\n", "\n", "
\n", "\n", "
\n", "
\n", "punctuation\n", "
\n", "
str
\n", "\n", " punctuation found after a word\n", "\n", "
\n", "\n", "
\n", "
\n", "ref\n", "
\n", "
str
\n", "\n", " biblical reference with word counting\n", "\n", "
\n", "\n", "
\n", "
\n", "referent\n", "
\n", "
str
\n", "\n", " number of referent\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " this is XML attribute rela\n", "\n", "
\n", "\n", "
\n", "
\n", "role\n", "
\n", "
str
\n", "\n", " role\n", "\n", "
\n", "\n", "
\n", "
\n", "rule\n", "
\n", "
str
\n", "\n", " syntactical rule\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " part-of-speach\n", "\n", "
\n", "\n", "
\n", "
\n", "strong\n", "
\n", "
int
\n", "\n", " strong number\n", "\n", "
\n", "\n", "
\n", "
\n", "subjrefspec\n", "
\n", "
str
\n", "\n", " this is XML attribute subjrefspec\n", "\n", "
\n", "\n", "
\n", "
\n", "tense\n", "
\n", "
str
\n", "\n", " verbal tense\n", "\n", "
\n", "\n", "
\n", "
\n", "text\n", "
\n", "
str
\n", "\n", " the text of a word\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " material after the end of the word (excluding critical signs)\n", "\n", "
\n", "\n", "
\n", "
\n", "trans\n", "
\n", "
str
\n", "\n", " translation of the word surface text according to the Berean Interlinear Bible\n", "\n", "
\n", "\n", "
\n", "
\n", "translit\n", "
\n", "
str
\n", "\n", " transliteration of the word surface text\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " syntactical type (on sentence, group, clause or phrase)\n", "\n", "
\n", "\n", "
\n", "
\n", "typems\n", "
\n", "
str
\n", "\n", " morphological type (on word), syntactical type (on sentence, group, clause, phrase or wg)\n", "\n", "
\n", "\n", "
\n", "
\n", "unaccent\n", "
\n", "
str
\n", "\n", " word in unicode characters without accents and diacritical markers\n", "\n", "
\n", "\n", "
\n", "
\n", "unicode\n", "
\n", "
str
\n", "\n", " word in unicode characters plus material after it\n", "\n", "
\n", "\n", "
\n", "
\n", "variant\n", "
\n", "
str
\n", "\n", " this is XML attribute variant\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " verse number, from ref attribute in xml\n", "\n", "
\n", "\n", "
\n", "
\n", "voice\n", "
\n", "
str
\n", "\n", " verbal voice\n", "\n", "
\n", "\n", "
\n", "
\n", "frame\n", "
\n", "
str
\n", "\n", " frame\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "parent\n", "
\n", "
none
\n", "\n", " parent relationship between words\n", "\n", "
\n", "\n", "
\n", "
\n", "sibling\n", "
\n", "
int
\n", "\n", " this is XML attribute sibling\n", "\n", "
\n", "\n", "
\n", "
\n", "subjref\n", "
\n", "
none
\n", "\n", " number of subject referent\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: CenterBLC/N1904
  3. appPath: C:/Users/tonyj/text-fabric-data/github/CenterBLC/N1904/app
  4. commit: gdb630837ae89b9468c9e50d13bda05cfd3de4f18
  5. css: ''
  6. dataDisplay:
    • excludedFeatures: []
    • noneValues:
      • none
      • unknown
      • no value
      • NA
    • sectionSep1:
    • sectionSep2: :
    • textFormat: text-orig-full
  7. docs:
    • docBase: https://github.com/CenterBLC/N1904/tree/main/docs
    • docPage: about
    • docRoot: https://github.com/CenterBLC/N1904
    • featureBase:https://github.com/CenterBLC/N1904/blob/main/docs/features/<feature>.md
    • featurePage: README
  8. interfaceDefaults: {fmt: text-orig-full}
  9. isCompatible: True
  10. local: local
  11. localDir:C:/Users/tonyj/text-fabric-data/github/CenterBLC/N1904/_temp
  12. provenanceSpec:
    • branch: main
    • corpus: Nestle 1904 Greek New Testament
    • doi: 10.5281/zenodo.13117910
    • moduleSpecs: []
    • org: CenterBLC
    • relative: /tf
    • repo: N1904
    • repro: N1904
    • version: 1.0.0
    • webBase: https://learner.bible/text/show_text/nestle1904/
    • webHint: Show this on the website
    • webLang: en
    • webUrl:https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: 1.0.0
  14. typeDisplay:
    • clause:
      • condense: True
      • label: {typ} {function} {rela} \\\\ {cls} {role} {junction}
      • style: ''
    • group:
      • label: {typ} {function} {rela} \\\\ {typems} {role} {rule}
      • style: ''
    • phrase:
      • condense: True
      • label: {typ} {function} {rela} \\\\ {typems} {role} {rule}
      • style: ''
    • sentence:
      • label: {typ} {function} {rela} \\\\ {role} {rule}
      • style: ''
    • subphrase:
      • label: {typ} {function} {rela} \\\\ {typems} {role} {rule}
      • style: ''
    • verse:
      • condense: True
      • label: {book} {chapter}:{verse}
      • style: ''
    • wg:
      • condense: True
      • label: {typems} {role} {rule} {junction}
      • style: ''
    • word:
      • features:
        • lemma
        • sp
      • featuresBare: [gloss]
  15. writing: grc
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Display is setup for viewtype [syntax-view](https://github.com/CenterBLC/N1904/blob/main/docs/syntax-view.md#start)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "See [here](https://github.com/CenterBLC/N1904/blob/main/docs/viewtypes.md#start) for more information on viewtypes" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# load the N1904 app and data\n", "N1904 = use (\"CenterBLC/N1904\", version=\"1.0.0\", hoist=globals())" ] }, { "cell_type": "code", "execution_count": 8, "id": "81e79d70-8c6a-4ddf-890c-f8af2072da27", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)\n", "N1904.dh(N1904.getCss())" ] }, { "cell_type": "markdown", "id": "c8a39b26-0a59-4142-bd1a-b9b27baab294", "metadata": {}, "source": [ "# 3 - Extracting n-grams \n", "##### [Back to TOC](#TOC)\n", "\n", "We'll extract n-grams of words along with their POS tags.\n", "\n", "In the following script we rely upon the following Text-Fabric features:\n", " - Greek word in Unicode from feature [unicode](https://centerblc.github.io/N1904/features/unicode.html#start)\n", " - Part of speech tag from feature [sp](https://centerblc.github.io/N1904/features/sp.html#start).\n", " - The lemma (dictionary form) from feature [lemma](https://centerblc.github.io/N1904/features/lemma.html#start).\n", " - The morphological tag from feature [morph](https://centerblc.github.io/N1904/features/morph.html#start)." ] }, { "cell_type": "markdown", "id": "2d43a656-beba-4336-a28a-d83070616737", "metadata": {}, "source": [ "## 3.1 - Define the n-gram size \n", "\n", "Set the size of the n-gram you wish to extract. For example, `n = 2` for bi-grams or `n = 3` for tri-grams. A lower n (e.g., 1 or 2) provides a more granular and fine analysis, focusing on smaller units, which can be useful for basic statistics or initial insights. A higher n (e.g., 3 or more) captures more nuanced and structured relationships, which can be useful in more complex linguistic or computational tasks." ] }, { "cell_type": "code", "execution_count": 39, "id": "fcd15761-5c4c-41ab-a0b6-161c9198f53e", "metadata": {}, "outputs": [], "source": [ "# Setting the size of the n-gram\n", "n = 3 # This would be for bigrams, change to 3 for trigrams, etc." ] }, { "cell_type": "markdown", "id": "16579093-d32e-4e8a-a352-749e3d93eb91", "metadata": {}, "source": [ "## 3.2 - Iterate through the text and extract n-grams \n", "\n", "The following script retrieves first all chapter nodes with `F.otype.s('chapter')` and gets their word descendants using `L.d(chapterNode, 'word')`. The `extractNGrams(words, n)` function generates n-grams from these word lists. Features features like unicode, sp, lemma, and morph are used to create each n-gram item. The n-grams are stored as dictionaries, and the first five are printed for verification." ] }, { "cell_type": "code", "execution_count": 42, "id": "d95453b8-0b59-4181-a6c2-415b6afbe0de", "metadata": {}, "outputs": [], "source": [ "# Function to extract n-grams from a list of words\n", "def extractNGrams(words, n):\n", " return [words[i:i+n] for i in range(len(words) - n + 1)]\n", "\n", "# Collect all n-grams in a list\n", "allNGrams = []\n", "\n", "# Iterate over all verses in the New Testament\n", "for chapterNode in F.otype.s('chapter'):\n", " wordsInChapter = L.d(chapterNode, 'word')\n", " nGramsInChapter = extractNGrams(wordsInChapter, n)\n", " \n", " for nGram in nGramsInChapter:\n", " nGramData = []\n", " for wordNode in nGram:\n", " wordText = F.unicode.v(wordNode) # Greek word in Unicode\n", " posTag = F.sp.v(wordNode) # Part of Speech\n", " lemma = F.lemma.v(wordNode) # Lemma of the word\n", " morph = F.morph.v(wordNode) # Morphological code\n", " \n", " # Collect data for each word in the n-gram\n", " nGramData.append({\n", " 'wordText': wordText,\n", " 'posTag': posTag,\n", " 'lemma': lemma,\n", " 'morph': morph\n", " })\n", " \n", " # Add the n-gram data to the list\n", " allNGrams.append(nGramData)\n" ] }, { "cell_type": "code", "execution_count": 44, "id": "b6959340-8034-42d0-9202-0ba8ee98f0e0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Words: Βίβλος γενέσεως Ἰησοῦ\n", "POS Tags: ['subs', 'subs', 'subs']\n", "--------------------------------------------------\n", "Words: γενέσεως Ἰησοῦ Χριστοῦ\n", "POS Tags: ['subs', 'subs', 'subs']\n", "--------------------------------------------------\n", "Words: Ἰησοῦ Χριστοῦ υἱοῦ\n", "POS Tags: ['subs', 'subs', 'subs']\n", "--------------------------------------------------\n", "Words: Χριστοῦ υἱοῦ Δαυεὶδ\n", "POS Tags: ['subs', 'subs', 'subs']\n", "--------------------------------------------------\n", "Words: υἱοῦ Δαυεὶδ υἱοῦ\n", "POS Tags: ['subs', 'subs', 'subs']\n", "--------------------------------------------------\n" ] } ], "source": [ "# verification - Print the first 5 n-grams\n", "for nGramData in allNGrams[:5]:\n", " words = [wordData['wordText'] for wordData in nGramData]\n", " posTags = [wordData['posTag'] for wordData in nGramData]\n", " print(f\"Words: {' '.join(words)}\")\n", " print(f\"POS Tags: {posTags}\")\n", " print('-' * 50)" ] }, { "cell_type": "markdown", "id": "e403c6e9-6682-4183-ba97-ce12270fdd1c", "metadata": {}, "source": [ "# 4 - Analyzing the n-grams \n", "##### [Back to TOC](#TOC)\n", "\n", "Once the n-grams are obtained, we can perform various analyses on the extracted n-grams. This section provides a few examples." ] }, { "cell_type": "markdown", "id": "3cd6b3a5-099c-419c-bb87-22948ef5d086", "metadata": {}, "source": [ "## 4.1 - N-gram frequency analysis \n", "\n", "The following script provides an statistic overview of the most frequent n-grams. " ] }, { "cell_type": "code", "execution_count": 45, "id": "f5dc6f4b-a1da-4274-9ea3-4536eae24172", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Most common n-grams

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
N-GramFrequency
0ὁ Υἱὸς τοῦ60
1ὁ δὲ εἶπεν54
2τοῦ Κυρίου ἡμῶν47
3λέγω ὑμῖν ὅτι42
4ὁ δὲ Ἰησοῦς42
5Υἱὸς τοῦ ἀνθρώπου41
6τοῦ Θεοῦ καὶ40
7αὐτοῖς ὁ Ἰησοῦς39
8ὁ Ἰησοῦς εἶπεν38
9δὲ ἐγέννησεν τὸν37
\n", "
" ], "text/plain": [ " N-Gram Frequency\n", "0 ὁ Υἱὸς τοῦ 60\n", "1 ὁ δὲ εἶπεν 54\n", "2 τοῦ Κυρίου ἡμῶν 47\n", "3 λέγω ὑμῖν ὅτι 42\n", "4 ὁ δὲ Ἰησοῦς 42\n", "5 Υἱὸς τοῦ ἀνθρώπου 41\n", "6 τοῦ Θεοῦ καὶ 40\n", "7 αὐτοῖς ὁ Ἰησοῦς 39\n", "8 ὁ Ἰησοῦς εἶπεν 38\n", "9 δὲ ἐγέννησεν τὸν 37" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from collections import Counter\n", "import pandas as pd\n", "from IPython.display import display\n", "\n", "# Convert n-grams to tuples of word texts for counting\n", "nGramTuples = [tuple(wordData['wordText'] for wordData in nGram) for nGram in allNGrams]\n", "\n", "# Count the frequency of each n-gram\n", "nGramFrequency = Counter(nGramTuples)\n", "\n", "# Prepare the data for the DataFrame\n", "nGramData = [{'N-Gram': ' '.join(nGram), 'Frequency': freq} for nGram, freq in nGramFrequency.most_common(10)]\n", "\n", "# Create a pandas DataFrame\n", "df = pd.DataFrame(nGramData)\n", "\n", "# Display the title and the DataFrame\n", "display(HTML(\"

Most common n-grams

\")) \n", "display(df)" ] }, { "cell_type": "markdown", "id": "50f8cee9-7bac-42ef-b15e-3c0183b2d58b", "metadata": {}, "source": [ "## 4.2 - POS-sequence frequency analysis " ] }, { "cell_type": "code", "execution_count": 46, "id": "8b51006a-2429-4108-bbdd-aeb2bda4e3dc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Most common POS tag sequences

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
POS Tag SequenceFrequency
0prep art subs3568
1verb art subs3513
2art subs pron3390
3art subs art2846
4subs art subs2765
5art subs verb2666
6art subs conj2402
7verb prep art1837
8conj art subs1818
9subs conj verb1719
\n", "
" ], "text/plain": [ " POS Tag Sequence Frequency\n", "0 prep art subs 3568\n", "1 verb art subs 3513\n", "2 art subs pron 3390\n", "3 art subs art 2846\n", "4 subs art subs 2765\n", "5 art subs verb 2666\n", "6 art subs conj 2402\n", "7 verb prep art 1837\n", "8 conj art subs 1818\n", "9 subs conj verb 1719" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from collections import Counter\n", "import pandas as pd\n", "from IPython.display import display, HTML\n", "\n", "# Extract POS tag sequences from n-grams\n", "posSequences = [tuple(wordData['posTag'] for wordData in nGram) for nGram in allNGrams]\n", "\n", "# Count the frequency of POS tag sequences\n", "posSequenceFrequency = Counter(posSequences)\n", "\n", "# Prepare the data for the DataFrame\n", "posSequenceData = [{'POS Tag Sequence': ' '.join(posSeq), 'Frequency': freq} for posSeq, freq in posSequenceFrequency.most_common(10)]\n", "\n", "# Create a pandas DataFrame\n", "df_pos = pd.DataFrame(posSequenceData)\n", "\n", "# Display the title and the DataFrame\n", "display(HTML(\"

Most common POS tag sequences

\")) # Title added here\n", "display(df_pos)" ] }, { "cell_type": "markdown", "id": "3c201844-fb6c-4b76-a1eb-1f9f6a64f050", "metadata": {}, "source": [ "## 4.3 - N-grams and ambiguity \n", "\n", "N-grams can be used improve POS tagging by identifying common contexts in which words appear. For instance, if a word is often preceded by a definite article, it's likely a noun. It can also be used for disambiguation. Suppose you encounter the word \"ἕως\" which (according the the MACULA XML Treebank) can be (in the context of the Greek New Testament) be either a preposition, a conjunction or a adverbial. The following script will generate a pie chart displaing the frequency distribution of the POS for the word \"ἕως\"." ] }, { "cell_type": "code", "execution_count": 47, "id": "e5f0cf11-943c-49e2-8a42-1b5a8b212969", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from collections import Counter\n", "\n", "# Define the target word in unicode\n", "targetWord = 'ἕως'\n", "\n", "# Prepare list to store the POS tags for \"ἕως\" when it's the last word\n", "posTagsForTargetWord = []\n", "\n", "# Find n-grams where \"ἕως\" is the last word and collect its POS tag\n", "for nGramData in allNGrams:\n", " words = [wordData['wordText'] for wordData in nGramData]\n", " if words[-1] == targetWord: # Only include if \"ἕως\" is the last word\n", " posTagsForTargetWord.append(nGramData[-1]['posTag']) # Collect the POS tag of \"ἕως\"\n", "\n", "# Count the frequency of each POS tag for \"ἕως\"\n", "posTagFrequency = Counter(posTagsForTargetWord)\n", "\n", "# Prepare data for the pie chart\n", "labels = list(posTagFrequency.keys())\n", "sizes = list(posTagFrequency.values())\n", "\n", "# Function to display absolute numbers and percentages on the pie chart\n", "def absolute_and_percentage(pct, allvals):\n", " absolute = int(pct/100. * sum(allvals))\n", " return f\"{absolute} ({pct:.1f}%)\"\n", "\n", "# Plot the pie chart\n", "plt.figure(figsize=(7,7))\n", "plt.pie(sizes, labels=labels, autopct=lambda pct: absolute_and_percentage(pct, sizes), \n", " startangle=140, shadow=False)\n", "plt.title(\"Distribution of POS tags for 'ἕως' as the last n-gram word\")\n", "plt.axis('equal') # Equal aspect ratio ensures that pie chart is drawn as a circle.\n", "\n", "# Show the pie chart\n", "plt.show()\n" ] }, { "cell_type": "markdown", "id": "d2d8141f-3264-4c1d-8962-4766cae4c734", "metadata": {}, "source": [ "## 4.4 - Correlating preceding n-grams with final POS tagging" ] }, { "cell_type": "markdown", "id": "4f532fce-fbd1-47c7-a879-0de9f684f0a0", "metadata": {}, "source": [ "This script analyzes the correlation between the part-of-speech (POS) tag sequences preceding the word \"ἕως\" and its POS tag when it appears as the last word in n-grams. It iterates through the n-grams, extracting the POS tags that precede \"ἕως\" and the POS tag assigned to \"ἕως.\" These sequences are then grouped and counted, and a heatmap is generated to visualize how different preceding POS tag sequences are distributed across various POS tags of \"ἕως,\" allowing to show the grammatical patterns leading up to its usage." ] }, { "cell_type": "code", "execution_count": 51, "id": "f9afd4f5-514e-4ad3-84d8-7e7be752b2bd", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from collections import Counter\n", "\n", "# Define the target word in unicode\n", "targetWord = 'ἕως'\n", "\n", "# Prepare lists to store preceding POS tags and the POS of \"ἕως\"\n", "precedingPOSTags = []\n", "finalPOSForTargetWord = []\n", "\n", "# Find n-grams where \"ἕως\" is the last word\n", "for nGramData in allNGrams:\n", " words = [wordData['wordText'] for wordData in nGramData]\n", " if words[-1] == targetWord: # Only include if \"ἕως\" is the last word\n", " posTags = [wordData['posTag'] for wordData in nGramData]\n", " precedingPOSTags.append(' '.join(posTags[:-1])) # Collect preceding POS tags\n", " finalPOSForTargetWord.append(posTags[-1]) # Collect POS tag of \"ἕως\"\n", "\n", "# Create a DataFrame to store the relationships\n", "df = pd.DataFrame({\n", " 'Preceding POS Tags': precedingPOSTags,\n", " 'Final POS (ἕως)': finalPOSForTargetWord\n", "})\n", "\n", "# Count occurrences of preceding POS tag sequences grouped by the POS of \"ἕως\"\n", "groupedData = df.groupby(['Final POS (ἕως)', 'Preceding POS Tags']).size().unstack(fill_value=0)\n", "\n", "# Plot a heatmap to visualize the relationship between preceding POS tags and the POS of \"ἕως\"\n", "plt.figure(figsize=(12, 8))\n", "sns.heatmap(groupedData, annot=True, fmt=\"d\", cmap=\"Blues\", cbar=True)\n", "plt.title(\"Preceding POS tag sequences correlated with POS of 'ἕως'\")\n", "plt.xlabel(\"Preceding POS tags\")\n", "plt.ylabel(\"Final POS (ἕως)\")\n", "plt.xticks(rotation=90)\n", "plt.yticks(rotation=0)\n", "plt.show()\n" ] }, { "cell_type": "markdown", "id": "1fc93973-e993-47b9-8aee-29e5e44eee89", "metadata": { "tags": [] }, "source": [ "# 5 - Required libraries \n", "##### [Back to TOC](#TOC)\n", "\n", "The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in your environment:\n", "\n", " pandas\n", " matplotlib\n", " seaborn\n", " collections\n", "\n", "You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`." ] }, { "cell_type": "markdown", "id": "8031bf58-0511-4fa9-964a-e3c3a86d5416", "metadata": {}, "source": [ "# 6 - Notebook and environment details\n", "##### [Back to TOC](#TOC)\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AuthorTony Jurg
Version1.0
Date20 October 2024
\n", "
" ] }, { "cell_type": "markdown", "id": "c197770b-f8f9-4479-8f93-b389ee4c1156", "metadata": {}, "source": [ "The following cell displays the active Anaconda environment along with a list of all installed packages and their versions within that environment." ] }, { "cell_type": "code", "execution_count": 49, "id": "f4c5a91f-f6de-4159-9869-109ad425cf03", "metadata": { "editable": true, "scrolled": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Text-Fabric * C:\\Users\\tonyj\\anaconda3\\envs\\Text-Fabric\n" ] }, { "data": { "text/html": [ "
Click to view installed packages
# packages in environment at C:\\Users\\tonyj\\anaconda3\\envs\\Text-Fabric:\r\n",
       "#\r\n",
       "# Name                    Version                   Build  Channel\r\n",
       "anyio                     4.6.2.post1        pyhd8ed1ab_0    conda-forge\r\n",
       "argon2-cffi               23.1.0             pyhd8ed1ab_0    conda-forge\r\n",
       "argon2-cffi-bindings      21.2.0          py312h4389bb4_5    conda-forge\r\n",
       "arrow                     1.3.0              pyhd8ed1ab_0    conda-forge\r\n",
       "asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge\r\n",
       "async-lru                 2.0.4              pyhd8ed1ab_0    conda-forge\r\n",
       "attrs                     24.2.0             pyh71513ae_0    conda-forge\r\n",
       "babel                     2.14.0             pyhd8ed1ab_0    conda-forge\r\n",
       "beautifulsoup4            4.12.3             pyha770c72_0    conda-forge\r\n",
       "bleach                    6.1.0              pyhd8ed1ab_0    conda-forge\r\n",
       "blinker                   1.8.2                    pypi_0    pypi\r\n",
       "brotli-python             1.1.0           py312h275cf98_2    conda-forge\r\n",
       "bzip2                     1.0.8                h2466b09_7    conda-forge\r\n",
       "ca-certificates           2024.8.30            h56e8100_0    conda-forge\r\n",
       "cached-property           1.5.2                hd8ed1ab_1    conda-forge\r\n",
       "cached_property           1.5.2              pyha770c72_1    conda-forge\r\n",
       "certifi                   2024.8.30          pyhd8ed1ab_0    conda-forge\r\n",
       "cffi                      1.17.1          py312h4389bb4_0    conda-forge\r\n",
       "charset-normalizer        3.4.0              pyhd8ed1ab_0    conda-forge\r\n",
       "click                     8.1.7                    pypi_0    pypi\r\n",
       "colorama                  0.4.6              pyhd8ed1ab_0    conda-forge\r\n",
       "comm                      0.2.2              pyhd8ed1ab_0    conda-forge\r\n",
       "contourpy                 1.3.0                    pypi_0    pypi\r\n",
       "cpython                   3.12.7          py312hd8ed1ab_0    conda-forge\r\n",
       "cycler                    0.12.1                   pypi_0    pypi\r\n",
       "debugpy                   1.8.7           py312h275cf98_0    conda-forge\r\n",
       "decorator                 5.1.1              pyhd8ed1ab_0    conda-forge\r\n",
       "defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge\r\n",
       "entrypoints               0.4                pyhd8ed1ab_0    conda-forge\r\n",
       "exceptiongroup            1.2.2              pyhd8ed1ab_0    conda-forge\r\n",
       "executing                 2.1.0              pyhd8ed1ab_0    conda-forge\r\n",
       "flask                     3.0.3                    pypi_0    pypi\r\n",
       "fonttools                 4.54.1                   pypi_0    pypi\r\n",
       "fqdn                      1.5.1              pyhd8ed1ab_0    conda-forge\r\n",
       "h11                       0.14.0             pyhd8ed1ab_0    conda-forge\r\n",
       "h2                        4.1.0              pyhd8ed1ab_0    conda-forge\r\n",
       "hpack                     4.0.0              pyh9f0ad1d_0    conda-forge\r\n",
       "httpcore                  1.0.6              pyhd8ed1ab_0    conda-forge\r\n",
       "httpx                     0.27.2             pyhd8ed1ab_0    conda-forge\r\n",
       "hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge\r\n",
       "idna                      3.10               pyhd8ed1ab_0    conda-forge\r\n",
       "importlib-metadata        8.5.0              pyha770c72_0    conda-forge\r\n",
       "importlib_metadata        8.5.0                hd8ed1ab_0    conda-forge\r\n",
       "importlib_resources       6.4.5              pyhd8ed1ab_0    conda-forge\r\n",
       "intel-openmp              2024.2.1          h57928b3_1083    conda-forge\r\n",
       "ipykernel                 6.29.5             pyh4bbf305_0    conda-forge\r\n",
       "ipython                   8.28.0             pyh7428d3b_0    conda-forge\r\n",
       "isoduration               20.11.0            pyhd8ed1ab_0    conda-forge\r\n",
       "itsdangerous              2.2.0                    pypi_0    pypi\r\n",
       "jedi                      0.19.1             pyhd8ed1ab_0    conda-forge\r\n",
       "jinja2                    3.1.4              pyhd8ed1ab_0    conda-forge\r\n",
       "json5                     0.9.25             pyhd8ed1ab_0    conda-forge\r\n",
       "jsonpointer               3.0.0           py312h2e8e312_1    conda-forge\r\n",
       "jsonschema                4.23.0             pyhd8ed1ab_0    conda-forge\r\n",
       "jsonschema-specifications 2024.10.1          pyhd8ed1ab_0    conda-forge\r\n",
       "jsonschema-with-format-nongpl 4.23.0               hd8ed1ab_0    conda-forge\r\n",
       "jupyter-lsp               2.2.5              pyhd8ed1ab_0    conda-forge\r\n",
       "jupyter_client            8.6.3              pyhd8ed1ab_0    conda-forge\r\n",
       "jupyter_core              5.7.2              pyh5737063_1    conda-forge\r\n",
       "jupyter_events            0.10.0             pyhd8ed1ab_0    conda-forge\r\n",
       "jupyter_server            2.14.2             pyhd8ed1ab_0    conda-forge\r\n",
       "jupyter_server_terminals  0.5.3              pyhd8ed1ab_0    conda-forge\r\n",
       "jupyterlab                4.2.5              pyhd8ed1ab_0    conda-forge\r\n",
       "jupyterlab_pygments       0.3.0              pyhd8ed1ab_1    conda-forge\r\n",
       "jupyterlab_server         2.27.3             pyhd8ed1ab_0    conda-forge\r\n",
       "kiwisolver                1.4.7                    pypi_0    pypi\r\n",
       "krb5                      1.21.3               hdf4eb48_0    conda-forge\r\n",
       "libblas                   3.9.0              24_win64_mkl    conda-forge\r\n",
       "libcblas                  3.9.0              24_win64_mkl    conda-forge\r\n",
       "libexpat                  2.6.3                he0c23c2_0    conda-forge\r\n",
       "libffi                    3.4.2                h8ffe710_5    conda-forge\r\n",
       "libhwloc                  2.11.1          default_h8125262_1000    conda-forge\r\n",
       "libiconv                  1.17                 hcfcfb64_2    conda-forge\r\n",
       "liblapack                 3.9.0              24_win64_mkl    conda-forge\r\n",
       "libsodium                 1.0.20               hc70643c_0    conda-forge\r\n",
       "libsqlite                 3.46.1               h2466b09_0    conda-forge\r\n",
       "libxml2                   2.12.7               h0f24e4e_4    conda-forge\r\n",
       "libzlib                   1.3.1                h2466b09_2    conda-forge\r\n",
       "markdown                  3.7                      pypi_0    pypi\r\n",
       "markdown2                 2.5.1                    pypi_0    pypi\r\n",
       "markupsafe                3.0.1           py312h31fea79_1    conda-forge\r\n",
       "matplotlib                3.9.2                    pypi_0    pypi\r\n",
       "matplotlib-inline         0.1.7              pyhd8ed1ab_0    conda-forge\r\n",
       "mistune                   3.0.2              pyhd8ed1ab_0    conda-forge\r\n",
       "mkl                       2024.1.0           h66d3029_694    conda-forge\r\n",
       "nbclient                  0.10.0             pyhd8ed1ab_0    conda-forge\r\n",
       "nbconvert-core            7.16.4             pyhd8ed1ab_1    conda-forge\r\n",
       "nbformat                  5.10.4             pyhd8ed1ab_0    conda-forge\r\n",
       "nest-asyncio              1.6.0              pyhd8ed1ab_0    conda-forge\r\n",
       "notebook                  7.2.2              pyhd8ed1ab_0    conda-forge\r\n",
       "notebook-shim             0.2.4              pyhd8ed1ab_0    conda-forge\r\n",
       "numpy                     2.1.2           py312hf10105a_0    conda-forge\r\n",
       "openssl                   3.3.2                h2466b09_0    conda-forge\r\n",
       "overrides                 7.7.0              pyhd8ed1ab_0    conda-forge\r\n",
       "packaging                 24.1               pyhd8ed1ab_0    conda-forge\r\n",
       "pandas                    2.2.3           py312h72972c8_1    conda-forge\r\n",
       "pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge\r\n",
       "parso                     0.8.4              pyhd8ed1ab_0    conda-forge\r\n",
       "pickleshare               0.7.5                   py_1003    conda-forge\r\n",
       "pillow                    11.0.0                   pypi_0    pypi\r\n",
       "pip                       24.2               pyh8b19718_1    conda-forge\r\n",
       "pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge\r\n",
       "platformdirs              4.3.6              pyhd8ed1ab_0    conda-forge\r\n",
       "prometheus_client         0.21.0             pyhd8ed1ab_0    conda-forge\r\n",
       "prompt-toolkit            3.0.48             pyha770c72_0    conda-forge\r\n",
       "psutil                    6.0.0           py312h4389bb4_2    conda-forge\r\n",
       "pthreads-win32            2.9.1                h2466b09_4    conda-forge\r\n",
       "pure_eval                 0.2.3              pyhd8ed1ab_0    conda-forge\r\n",
       "pycparser                 2.22               pyhd8ed1ab_0    conda-forge\r\n",
       "pygments                  2.18.0             pyhd8ed1ab_0    conda-forge\r\n",
       "pyparsing                 3.2.0                    pypi_0    pypi\r\n",
       "pysocks                   1.7.1              pyh0701188_6    conda-forge\r\n",
       "python                    3.12.7          hce54a09_0_cpython    conda-forge\r\n",
       "python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge\r\n",
       "python-fastjsonschema     2.20.0             pyhd8ed1ab_0    conda-forge\r\n",
       "python-json-logger        2.0.7              pyhd8ed1ab_0    conda-forge\r\n",
       "python-tzdata             2024.2             pyhd8ed1ab_0    conda-forge\r\n",
       "python_abi                3.12                    5_cp312    conda-forge\r\n",
       "pytz                      2024.1             pyhd8ed1ab_0    conda-forge\r\n",
       "pywin32                   307             py312h275cf98_3    conda-forge\r\n",
       "pywinpty                  2.0.13          py312h275cf98_1    conda-forge\r\n",
       "pyyaml                    6.0.2           py312h4389bb4_1    conda-forge\r\n",
       "pyzmq                     26.2.0          py312hd7027bb_3    conda-forge\r\n",
       "referencing               0.35.1             pyhd8ed1ab_0    conda-forge\r\n",
       "requests                  2.32.3             pyhd8ed1ab_0    conda-forge\r\n",
       "rfc3339-validator         0.1.4              pyhd8ed1ab_0    conda-forge\r\n",
       "rfc3986-validator         0.1.1              pyh9f0ad1d_0    conda-forge\r\n",
       "rpds-py                   0.20.0          py312h2615798_1    conda-forge\r\n",
       "seaborn                   0.13.2                   pypi_0    pypi\r\n",
       "send2trash                1.8.3              pyh5737063_0    conda-forge\r\n",
       "setuptools                75.1.0             pyhd8ed1ab_0    conda-forge\r\n",
       "six                       1.16.0             pyh6c4a22f_0    conda-forge\r\n",
       "sniffio                   1.3.1              pyhd8ed1ab_0    conda-forge\r\n",
       "soupsieve                 2.5                pyhd8ed1ab_1    conda-forge\r\n",
       "stack_data                0.6.2              pyhd8ed1ab_0    conda-forge\r\n",
       "tbb                       2021.13.0            hc790b64_0    conda-forge\r\n",
       "terminado                 0.18.1             pyh5737063_0    conda-forge\r\n",
       "text-fabric               12.5.5                   pypi_0    pypi\r\n",
       "tinycss2                  1.3.0              pyhd8ed1ab_0    conda-forge\r\n",
       "tk                        8.6.13               h5226925_1    conda-forge\r\n",
       "tomli                     2.0.2              pyhd8ed1ab_0    conda-forge\r\n",
       "tornado                   6.4.1           py312h4389bb4_1    conda-forge\r\n",
       "traitlets                 5.14.3             pyhd8ed1ab_0    conda-forge\r\n",
       "types-python-dateutil     2.9.0.20241003     pyhff2d567_0    conda-forge\r\n",
       "typing-extensions         4.12.2               hd8ed1ab_0    conda-forge\r\n",
       "typing_extensions         4.12.2             pyha770c72_0    conda-forge\r\n",
       "typing_utils              0.1.0              pyhd8ed1ab_0    conda-forge\r\n",
       "tzdata                    2024b                hc8b5060_0    conda-forge\r\n",
       "ucrt                      10.0.22621.0         h57928b3_1    conda-forge\r\n",
       "uri-template              1.3.0              pyhd8ed1ab_0    conda-forge\r\n",
       "urllib3                   2.2.3              pyhd8ed1ab_0    conda-forge\r\n",
       "vc                        14.3                h8a93ad2_22    conda-forge\r\n",
       "vc14_runtime              14.40.33810         hcc2c482_22    conda-forge\r\n",
       "vs2015_runtime            14.40.33810         h3bf8584_22    conda-forge\r\n",
       "wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge\r\n",
       "webcolors                 24.8.0             pyhd8ed1ab_0    conda-forge\r\n",
       "webencodings              0.5.1              pyhd8ed1ab_2    conda-forge\r\n",
       "websocket-client          1.8.0              pyhd8ed1ab_0    conda-forge\r\n",
       "werkzeug                  3.0.4                    pypi_0    pypi\r\n",
       "wheel                     0.44.0             pyhd8ed1ab_0    conda-forge\r\n",
       "win_inet_pton             1.1.0              pyh7428d3b_7    conda-forge\r\n",
       "winpty                    0.4.3                         4    conda-forge\r\n",
       "xz                        5.2.6                h8d14728_0    conda-forge\r\n",
       "yaml                      0.2.5                h8ffe710_2    conda-forge\r\n",
       "zeromq                    4.3.5                ha9f60a1_6    conda-forge\r\n",
       "zipp                      3.20.2             pyhd8ed1ab_0    conda-forge\r\n",
       "zstandard                 0.23.0          py312h7606c53_1    conda-forge\r\n",
       "zstd                      1.5.6                h0ea2cb4_0    conda-forge\r\n",
       "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import subprocess\n", "from IPython.display import display, HTML\n", "\n", "# Display the active conda environment\n", "!conda env list | findstr \"*\"\n", "\n", "# Run conda list and capture the output\n", "condaListOutput = subprocess.check_output(\"conda list\", shell=True).decode(\"utf-8\")\n", "\n", "# Wrap the output with
and HTML tags\n", "htmlOutput = \"
Click to view installed packages
\"\n",
    "htmlOutput += condaListOutput\n",
    "htmlOutput += \"
\"\n", "\n", "# Display the HTML in the notebook\n", "display(HTML(htmlOutput))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 5 }