{ "cells": [ { "cell_type": "markdown", "id": "7f891ec4-ea4a-4e21-b87a-4c163946080d", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "# [Doc4TF](https://github.com/tonyjurg/Doc4TF)\n", "#### *Automatic creation of feature documentation for existing Text-Fabric datasets*\n", "\n", "Version: 0.3 (Jan. 24, 2024); fixing bug [10](https://github.com/tonyjurg/Doc4TF/issues/10) (Feb. 2, 2024)" ] }, { "cell_type": "markdown", "id": "da55f600-c935-4b2e-835f-e8eeaad7d38c", "metadata": { "jp-MarkdownHeadingCollapsed": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Table of content \n", "* 1 - Introduction\n", "* 2 - Setting up the environment\n", "* 3 - Load Text-Fabric data\n", "* 4 - Creation of the dataset\n", " * 4.1 - Setting up some global variables\n", " * 4.2 - Store all relevant data into a dictionary\n", "* 5 - Create the documentation pages\n", " * 5.1 - Create the set of feature pages\n", " * 5.2 - Create the index pages\n", "* 6 - Licence" ] }, { "cell_type": "markdown", "id": "411d018e-e242-48de-9117-4e2f95f055fc", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "# 1 - Introduction \n", "##### [Back to TOC](#TOC)\n", "\n", "Ideally, a comprehensive documentation set should be created as part of developing a Text-Fabric dataset. However, in practice, this is not always completed during the initial phase or after changes to features. This Jupyter Notebook contains Python code to automatically generate (and thus ensure consistency) a documentation set for any [Text-Fabric](https://github.com/annotation/text-fabric) dataset. It serves as a robust starting point for the development of a brand new documentation set or as validation for an existing one. One major advantage is that the resulting documentation set is fully hyperlinked, a task that can be laborious if done manually.\n", "\n", "The main steps in producing the documentation set are:\n", "* Load a Text-Fabric database\n", "* Execute the code pressent in the subsequent cells. The code will:\n", " * Construct the python dictionarie stroring relevant data from the TF datase \n", " * Create separate files for each feature\n", " * Create a set of overview pages sorting the nodes accordingly \n", " \n", "The output format can be either Markdown, the standard for feature documentation stored on GitHub using its on-site processor, or HTML, which facilitates local storage and browsing with any web browser." ] }, { "cell_type": "markdown", "id": "0901649a-bf63-485f-9499-8311210b6ef7", "metadata": {}, "source": [ "# 2. Setting up the environment\n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "markdown", "id": "fcac99b8-f133-4fdc-ad10-97427954dee8", "metadata": {}, "source": [ "Your environment should (for obvious reasons) include the Python package `Text-Fabric`. If not installed yet, it can be installed using `pip`. Further it is required to be able to invoke the Text-Fabric data set (either from an online resource, or from a localy stored copy). There are no further requirements as the scripts basicly operate 'stand alone'. " ] }, { "cell_type": "markdown", "id": "ab0e6ff5-4f6b-4c2f-b6cb-db2a039d042a", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 3 - Load Text-Fabric data \n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "markdown", "id": "ceb50c9d-31c1-42d9-8e89-1c56b88d4e5f", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "At this step, the Text-Fabric dataset is loaded, which embedded data will be used to create a documentation set. For various options regarding other possible storage locations, see the documentation for function [`use`](https://annotation.github.io/text-fabric/tf/app.html#tf.app.use)." ] }, { "cell_type": "code", "execution_count": 1, "id": "177f91d0-0baf-45eb-8450-dcce0e6b4b86", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "id": "20a63ffb-0b5d-4586-bce3-bcb39a84273d", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "outputs": [], "source": [ "# Loading the Text-Fabric code\n", "# Note: it is assumed Text-Fabric is installed in your environment\n", "from tf.fabric import Fabric\n", "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 3, "id": "1f2d822d-b47a-4766-8226-8157799740c0", "metadata": { "scrolled": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/saulocantanhede/tfgreek2/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.5" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ " | 29s T sibling from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.5\n" ] }, { "data": { "text/html": [ "\n", " TF: TF API 12.2.2, saulocantanhede/tfgreek2/app v3, Search Reference
\n", " Data: saulocantanhede - tfgreek2 0.5.5, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots / node% coverage
book275102.93100
chapter260529.92100
verse794417.34100
sentence1976713.79198
group89647.0246
clause304797.19159
wg1068686.88533
phrase664241.9393
subphrase1190131.59138
word1377791.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Nestle 1904 Greek New Testament\n", "
\n", "\n", "
\n", "
\n", "after\n", "
\n", "
str
\n", "\n", " material after the end of the word\n", "\n", "
\n", "\n", "
\n", " \n", "
int
\n", "\n", " 1 if it is an apposition container\n", "\n", "
\n", "\n", "
\n", "
\n", "articular\n", "
\n", "
int
\n", "\n", " 1 if the sentence, group, clause, phrase or wg has an article\n", "\n", "
\n", "\n", "
\n", "
\n", "before\n", "
\n", "
str
\n", "\n", " this is XML attribute before\n", "\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " book name (full name)\n", "\n", "
\n", "\n", "
\n", "
\n", "bookshort\n", "
\n", "
str
\n", "\n", " book name (abbreviated) from ref attribute in xml\n", "\n", "
\n", "\n", "
\n", "
\n", "case\n", "
\n", "
str
\n", "\n", " grammatical case\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " chapter number, from ref attribute in xml\n", "\n", "
\n", "\n", "
\n", "
\n", "clausetype\n", "
\n", "
str
\n", "\n", " clause type\n", "\n", "
\n", "\n", "
\n", "
\n", "cls\n", "
\n", "
str
\n", "\n", " this is XML attribute cls\n", "\n", "
\n", "\n", "
\n", "
\n", "cltype\n", "
\n", "
str
\n", "\n", " clause type\n", "\n", "
\n", "\n", "
\n", "
\n", "criticalsign\n", "
\n", "
str
\n", "\n", " this is XML attribute criticalsign\n", "\n", "
\n", "\n", "
\n", "
\n", "crule\n", "
\n", "
str
\n", "\n", " clause rule (from xml attribute Rule)\n", "\n", "
\n", "\n", "
\n", "
\n", "degree\n", "
\n", "
str
\n", "\n", " grammatical degree\n", "\n", "
\n", "\n", "
\n", "
\n", "discontinuous\n", "
\n", "
int
\n", "\n", " 1 if the word is out of sequence in the xml\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " domain\n", "\n", "
\n", "\n", "
\n", "
\n", "framespec\n", "
\n", "
str
\n", "\n", " this is XML attribute framespec\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " this is XML attribute function\n", "\n", "
\n", "\n", "
\n", "
\n", "gender\n", "
\n", "
str
\n", "\n", " grammatical gender\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " short translation\n", "\n", "
\n", "\n", "
\n", "
\n", "id\n", "
\n", "
str
\n", "\n", " xml id\n", "\n", "
\n", "\n", "
\n", "
\n", "junction\n", "
\n", "
str
\n", "\n", " type of junction\n", "\n", "
\n", "\n", "
\n", "
\n", "lang\n", "
\n", "
str
\n", "\n", " language the text is in\n", "\n", "
\n", "\n", "
\n", "
\n", "lemma\n", "
\n", "
str
\n", "\n", " lexical lemma\n", "\n", "
\n", "\n", "
\n", "
\n", "ln\n", "
\n", "
str
\n", "\n", " ln\n", "\n", "
\n", "\n", "
\n", "
\n", "mood\n", "
\n", "
str
\n", "\n", " verbal mood\n", "\n", "
\n", "\n", "
\n", "
\n", "morph\n", "
\n", "
str
\n", "\n", " morphological code\n", "\n", "
\n", "\n", "
\n", "
\n", "nodeid\n", "
\n", "
int
\n", "\n", " node id (as in the XML source data\n", "\n", "
\n", "\n", "
\n", "
\n", "normalized\n", "
\n", "
str
\n", "\n", " lemma normalized\n", "\n", "
\n", "\n", "
\n", "
\n", "note\n", "
\n", "
str
\n", "\n", " annotation of linguistic nature\n", "\n", "
\n", "\n", "
\n", "
\n", "num\n", "
\n", "
int
\n", "\n", " generated number (not in xml): book: (Matthew=1, Mark=2, ..., Revelation=27); sentence: numbered per chapter; word: numbered per verse.\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
str
\n", "\n", " grammatical number\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "person\n", "
\n", "
str
\n", "\n", " grammatical person\n", "\n", "
\n", "\n", "
\n", "
\n", "punctuation\n", "
\n", "
str
\n", "\n", " this is XML attribute punctuation\n", "\n", "
\n", "\n", "
\n", "
\n", "ref\n", "
\n", "
str
\n", "\n", " biblical reference with word counting\n", "\n", "
\n", "\n", "
\n", "
\n", "referent\n", "
\n", "
str
\n", "\n", " number of referent\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " this is XML attribute rela\n", "\n", "
\n", "\n", "
\n", "
\n", "role\n", "
\n", "
str
\n", "\n", " role\n", "\n", "
\n", "\n", "
\n", "
\n", "rule\n", "
\n", "
str
\n", "\n", " syntactical rule\n", "\n", "
\n", "\n", "
\n", "
\n", "strong\n", "
\n", "
int
\n", "\n", " strong number\n", "\n", "
\n", "\n", "
\n", "
\n", "subjrefspec\n", "
\n", "
str
\n", "\n", " this is XML attribute subjrefspec\n", "\n", "
\n", "\n", "
\n", "
\n", "tense\n", "
\n", "
str
\n", "\n", " verbal tense\n", "\n", "
\n", "\n", "
\n", "
\n", "text\n", "
\n", "
str
\n", "\n", " the text of a word\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " this is XML attribute typ\n", "\n", "
\n", "\n", "
\n", "
\n", "type\n", "
\n", "
str
\n", "\n", " morphological type (on word), syntactical type (on sentence, group, clause, phrase or wg)\n", "\n", "
\n", "\n", "
\n", "
\n", "unicode\n", "
\n", "
str
\n", "\n", " word in unicode characters plus material after it\n", "\n", "
\n", "\n", "
\n", "
\n", "variant\n", "
\n", "
str
\n", "\n", " this is XML attribute variant\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " verse number, from ref attribute in xml\n", "\n", "
\n", "\n", "
\n", "
\n", "voice\n", "
\n", "
str
\n", "\n", " verbal voice\n", "\n", "
\n", "\n", "
\n", "
\n", "frame\n", "
\n", "
str
\n", "\n", " frame\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "parent\n", "
\n", "
none
\n", "\n", " parent relationship between words\n", "\n", "
\n", "\n", "
\n", "
\n", "sibling\n", "
\n", "
int
\n", "\n", " this is XML attribute sibling\n", "\n", "
\n", "\n", "
\n", "
\n", "subjref\n", "
\n", "
none
\n", "\n", " number of subject referent\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: saulocantanhede/tfgreek2
  3. appPath:C:/Users/tonyj/text-fabric-data/github/saulocantanhede/tfgreek2/app
  4. commit: 8c4bc8e48e66e32f614b5966813104c0894ad822
  5. css: ''
  6. dataDisplay:
    • excludedFeatures: []
    • noneValues:
      • none
      • unknown
      • no value
      • NA
    • sectionSep1: @
    • textFormat: text-orig-full
  7. docs:
    • docPage: about
    • featureBase: {docBase}/transcription.md
    • featurePage: transcription
  8. interfaceDefaults: {fmt: text-orig-full}
  9. isCompatible: True
  10. local: local
  11. localDir:C:/Users/tonyj/text-fabric-data/github/saulocantanhede/tfgreek2/_temp
  12. provenanceSpec:
    • corpus: Nestle 1904 Greek New Testament
    • doi: 10.5281/zenodo.notyet
    • moduleSpecs: []
    • org: saulocantanhede
    • relative: /tf
    • repo: tfgreek2
    • version: 0.5.5
    • webBase: https://learner.bible/text/show_text/nestle1904/
    • webHint: Show this on the website
    • webLang: en
    • webUrl:https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: 0.5.4
  14. typeDisplay:
    • clause:
      • condense: True
      • label: #{num}: {cls} {rule} {junction}
      • style: ''
    • group:
      • label: #{num}:
      • style: ''
    • phrase:
      • condense: True
      • label: #{num}: {function} {role} {rule} {type}
      • style: ''
    • sentence:
      • label: #{num}: {rule}
      • style: ''
    • verse:
      • condense: True
      • label: {book} {chapter}:{verse}
      • style: ''
    • wg:
      • condense: True
      • label: #{num}: {type} {role} {rule} {junction}
      • style: ''
    • word: {base: True}
  15. writing: grc
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# load the app and data\n", "A = use (\"saulocantanhede/tfgreek2\", version=\"0.5.5\", hoist=globals())" ] }, { "cell_type": "markdown", "id": "141768d0-5893-4d44-8667-22dd129a1159", "metadata": {}, "source": [ "# 4 - Creation of the dataset" ] }, { "cell_type": "markdown", "id": "0feac423-fe52-4d79-bd4a-36621c9f795e", "metadata": { "tags": [] }, "source": [ "## 4.1 - Setting up some global variables\n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "code", "execution_count": 4, "id": "dd126f03-48a6-4980-bac5-8ec86d6c840e", "metadata": { "tags": [] }, "outputs": [], "source": [ "# If the following variable is set, it will be used as title for all pages. It is intended to the describe the dataset in one line\n", "# customPageTitleMD=\"N1904 Greek New Testament [saulocantanhede/tfgreek2 - 0.5.4](https://github.com/saulocantanhede/tfgreek2)\"\n", "# customPageTitleHTML=\"N1904 Greek New Testament saulocantanhede/tfgreek2 - 0.5.4\"\n", "\n", "# Specify the location to store the resulting files, relative to the location of this notebook (without a trailing slash).\n", "resultLocation = \"results\"\n", "\n", "# Type of output format ('html' for HTML, 'md' for Mark Down, or 'both' for both HTML and Mark Down)\n", "typeOutput='both'\n", "\n", "# HTML table style definition (only relevant for HTML output format)\n", "htmlStyle=''\n", "\n", "# Limit the number of entries in the frequency tables per node type on each feature description page to this number\n", "tableLimit=10\n", "\n", "# This switch can be set to 'True' if you want additional information, such as dictionary entries and file details, to be printed. For basic output, set this switch to 'False'.\n", "verbose=False\n", "\n", "# The version number of the script\n", "scriptVersion=\"0.3\"\n", "scriptDate=\"Jan. 24, 2024\"\n", "\n", "\n", "# Create the footers for MD and HTML, include today's date\n", "from datetime import datetime\n", "today = datetime.today()\n", "formatted_date = today.strftime(\"%b. %d, %Y\")\n", "footerMD=f'\\n\\nCreated on {formatted_date} using [Doc4TF version {scriptVersion} ({scriptDate})](https://github.com/tonyjurg/Doc4TF)'\n", "footerHTML=f'\\n

Created on {formatted_date} using Doc4TF - version {scriptVersion} ({scriptDate})

'" ] }, { "cell_type": "markdown", "id": "9c1870d9-1821-4c14-a57c-f379e92af3ed", "metadata": {}, "source": [ "## 4.2 - Store all relevant data into a dictionary\n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "markdown", "id": "5f85a8fc-05c8-4b64-b990-f5d7113b7cf9", "metadata": {}, "source": [ "The following will create a dictionary containing all relevant information for the loaded node and edge features." ] }, { "cell_type": "code", "execution_count": 5, "id": "e2fd353b-f615-4954-917e-65e61ff8a3d8", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Gathering generic details\n", "Analyzing Node Features: ..................................................\n", "Analyzing Edge Features: .....\n", "Finished\n" ] } ], "source": [ "# Initialize an empty dictionary to store feature data\n", "featureDict = {}\n", "\n", "# Function to get feature description from metadata\n", "def get_feature_description(metaData):\n", " return metaData.get('description', \"No feature description\")\n", "\n", "# Function to set data type based on 'valueType' in metadata\n", "def set_data_type(metaData):\n", " if 'valueType' in metaData:\n", " return \"String\" if metaData[\"valueType\"] == 'str' else \"Integer\"\n", " return \"Unknown\"\n", "\n", "# Function to process and add feature data to the dictionary\n", "def process_feature(feature, featureType, featureMethod):\n", " # Obtain the meta data\n", " featureMetaData = featureMethod(feature).meta\n", " featureDescription = get_feature_description(featureMetaData)\n", " dataType = set_data_type(featureMetaData)\n", "\n", " # Initialize dictionary to store feature frequency data\n", " featureFrequencyDict = {}\n", "\n", " # Skip for specific features based on type\n", " if not (featureType == 'Node' and feature == 'otype') and not (featureType == 'Edge' and feature == 'oslots'):\n", " for nodeType in F.otype.all:\n", " frequencyLists = featureMethod(feature).freqList(nodeType)\n", " if not isinstance(frequencyLists, int):\n", " if len(frequencyLists)!=0:\n", " featureFrequencyDict[nodeType] = {'nodetype': nodeType, 'freq': frequencyLists[:tableLimit]}\n", " elif isinstance(frequencyLists, int):\n", " if frequencyLists != 0:\n", " featureFrequencyDict[nodeType] = {'nodetype': nodeType, 'freq': [(\"Link\", frequencyLists)]}\n", "\n", " # Add processed feature data to the main dictionary\n", " featureDict[feature] = {'name': feature, 'descr': featureDescription, 'type': featureType, 'datatype': dataType, 'freqlist': featureFrequencyDict}\n", " \n", "########################################################\n", "# MAIN FUNCTION #\n", "########################################################\n", "\n", "########################################################\n", "# Gather general information #\n", "########################################################\n", "\n", "print('Gathering generic details')\n", "\n", "# Initialize default values\n", "corpusName = A.appName\n", "liveName = ''\n", "versionName = A.version\n", "\n", "# Trying to locate corpus information\n", "if A.provenance:\n", " for parts in A.provenance[0]: \n", " if isinstance(parts, tuple):\n", " key, value = parts[0], parts[1]\n", " if verbose: print (f'General info: {key}={value}')\n", " if key == 'corpus': corpusName = value\n", " if key == 'version': versionName = value\n", " # value for live is a tuple\n", " if key == 'live': liveName=value[1]\n", "if liveName is not None and len(liveName)>1:\n", " # an URL was found\n", " pageTitleMD = f'Doc4TF pages for [{corpusName}]({liveName}) (version {versionName})'\n", " pageTitleHTML = f'

Doc4TF pages for {corpusName} (version {versionName})

'\n", "else:\n", " # No URL found\n", " pageTitleMD = f'Doc4TF pages for {corpusName} (version {versionName})'\n", " pageTitleHTML = f'

Doc4TF pages for {corpusName} (version {versionName})

'\n", "\n", "# Overwrite in case user provided a title\n", "if 'customPageTitleMD_' in globals():\n", " pageTitleMD = customPageTitleMD\n", "if 'customPageTitleHTML' in globals():\n", " pageTitleMD = customPageTitleHTML\n", "\n", " \n", "########################################################\n", "# Processing node features #\n", "########################################################\n", "\n", "print('Analyzing Node Features: ', end='')\n", "for nodeFeature in Fall():\n", " if not verbose: print('.', end='') # Progress indicator\n", " process_feature(nodeFeature, 'Node', Fs)\n", " if verbose: print(f'\\nFeature {nodeFeature} = {featureDict[nodeFeature]}\\n') # Print feature data if verbose\n", "\n", "########################################################\n", "# Processing edge features #\n", "########################################################\n", "\n", "print('\\nAnalyzing Edge Features: ', end='')\n", "for edgeFeature in Eall():\n", " if not verbose: print('.', end='') # Progress indicator\n", " process_feature(edgeFeature, 'Edge', Es)\n", " if verbose: print(f'\\nFeature {edgeFeature} = {featureDict[edgeFeature]}\\n') # Print feature data if verbose\n", "\n", "print('\\nFinished')" ] }, { "cell_type": "markdown", "id": "cabf4dd5-b817-42cf-8644-6431970085a7", "metadata": {}, "source": [ "## 5 - Create the documentation pages\n", "\n", "Two types of pages will be created:\n", " * Feature description pages (one per feature)\n", " * Set of index pages (linking to the feature pages)" ] }, { "cell_type": "markdown", "id": "3b9abcc1-1362-42a0-85db-8234766a41f5", "metadata": {}, "source": [ "## 5.1 - Create the set of feature pages\n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "code", "execution_count": 6, "id": "c4b5e148-c336-4d95-a238-5fe79fa9525c", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Finished (written 110 html and md files to directory C:\\Users\\tonyj\\OneDrive\\Documents\\GitHub\\Doc4TF\\results)\n" ] } ], "source": [ "import os\n", "\n", "# Initialize a counter for the number of files created\n", "filesCreated = 0\n", "# Get the current working directory and append a backslash for path building\n", "pathFull = os.getcwd() + '\\\\'\n", "\n", "# Iterating over each feature in the feature dictionary\n", "for featureName, featureData in featureDict.items():\n", " # Extracting various properties of each feature\n", " featureDescription = featureData.get('descr')\n", " featureType = featureData.get('type')\n", " featureDataType = featureData.get('datatype')\n", " \n", " # Initializing strings to accumulate HTML and Markdown content\n", " nodeListHTML = nodeListMD = ''\n", " tableListHTML = tableListMD = ''\n", " frequencyData = featureData.get('freqlist')\n", "\n", " # Processing frequency data for each node\n", " for node in frequencyData:\n", " # Building HTML and Markdown links for each node\n", " nodeListHTML += f' {node}'\n", " nodeListMD += f' [`{node}`](featurebynodetype.md#{node}) '\n", "\n", " # Starting HTML and Markdown tables for frequency data\n", " tableListHTML += f'

Frequency for nodetype {node}

'\n", " tableListMD += f'### Frequency for nodetype [{node}](featurebynodetype.md#{node})\\nValue|Occurences\\n---|---\\n'\n", "\n", " # Populating tables with frequency data\n", " itemData = frequencyData.get(node).get('freq')\n", " for item in itemData:\n", " handleSpace = item[0] if item[0] != ' ' else 'space' # prevent garbling of tables where the value itself is a space\n", " tableListHTML += f''\n", " tableListMD += f'{handleSpace}|{item[1]}\\n'\n", " tableListHTML += f'
ValueOccurences
{handleSpace}{item[1]}
\\n'\n", "\n", " # Creating info blocks for HTML and Markdown\n", " infoBlockHTML = f'
Data typeFeature typeAvailable for nodes
{featureDataType}{featureType}{nodeListHTML}
'\n", " infoBlockMD = f'Data type|Feature type|Available for nodes\\n---|---|---\\n[`{featureDataType}`](featurebydatatype.md#{featureDataType.lower()})|[`{featureType}`](featurebytype.md#{featureType.lower()})|{nodeListMD}'\n", "\n", " # Outputting in Markdown format\n", " if typeOutput in ('md','both'):\n", " pageMD = f'{pageTitleMD}\\n# Feature: {featureName}\\n{infoBlockMD}\\n## Description\\n{featureDescription}\\n## Feature Values\\n{tableListMD} {footerMD} '\n", " fileNameMD = os.path.join(resultLocation, f\"{featureName}.md\")\n", " try:\n", " with open(fileNameMD, \"w\", encoding=\"utf-8\") as file:\n", " file.write(pageMD)\n", " filesCreated += 1\n", " # Log if verbose mode is on\n", " if verbose: print(f\"Markdown content written to {pathFull + fileNameMD}\")\n", " except Exception as e:\n", " print(f\"Exception: {e}\")\n", " break # Stops execution on encountering an exception\n", "\n", " # Outputting in HTML format\n", " if typeOutput in ('html','both'):\n", " pageHTML = f'{htmlStyle}

{pageTitleHTML}

\\n

Feature: {featureName}

\\n{infoBlockHTML}\\n

Description

\\n

{featureDescription}

\\n

Feature Values

\\n{tableListHTML} {footerHTML}'\n", " fileNameHTML = os.path.join(resultLocation, f\"{featureName}.htm\")\n", " try:\n", " with open(fileNameHTML, \"w\", encoding=\"utf-8\") as file:\n", " file.write(pageHTML)\n", " filesCreated += 1\n", " # Log if verbose mode is on\n", " if verbose: print(f\"HTML content written to {pathFull + fileNameHTML}\")\n", " except Exception as e:\n", " print(f\"Exception: {e}\")\n", " break # Stops execution on encountering an exception\n", "\n", "# Reporting the number of files created\n", "if filesCreated != 0:\n", " print(f'Finished (written {filesCreated} {\"html and md\" if typeOutput == \"both\" else typeOutput} files to directory {pathFull + resultLocation})')\n", "else:\n", " print('No files written')" ] }, { "cell_type": "markdown", "id": "cb9ec070-ae31-4ab5-b356-1b07666e5026", "metadata": {}, "source": [ "## 5.2 - Create the index pages\n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "code", "execution_count": 7, "id": "bb568470-a2ed-446b-86c9-e298fb4bc6b3", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Finished (written 6 html and md files to directory C:\\Users\\tonyj\\OneDrive\\Documents\\GitHub\\Doc4TF\\results)\n" ] } ], "source": [ "# Initialize a counter for the number of files created\n", "filesCreated = 0\n", "\n", "\n", "# Example data function to create a list of examples for a given feature\n", "def exampleData(feature):\n", " # Check if the feature exists in featureDict and has non-empty freqlist.\n", " if feature in featureDict and featureDict[feature]['freqlist']:\n", " # Get the first value from the freqlist\n", " freq_list = next(iter(featureDict[feature]['freqlist'].values()))['freq']\n", " # Use list comprehension to create the example list. \n", " example_list = ' '.join(f'`{item[0]}`' for item in freq_list[:4])\n", " return example_list\n", " else:\n", " return \"No values\"\n", "\n", " \n", "def writeToFile(fileName, content, fileType, verbose):\n", " \"\"\"\n", " Writes content to a file.\n", " :param fileName: The name of the file to write to.\n", " :param content: The content to write.\n", " :param fileType: The type of file (e.g., 'md' for Markdown, 'html' for HTML).\n", " :param verbose: If True, prints a message upon successful writing.\n", " \"\"\"\n", " global filesCreated\n", " try:\n", " with open(fileName, \"w\", encoding=\"utf-8\") as file:\n", " file.write(content)\n", " filesCreated+=1\n", " if verbose: \n", " print(f\"{fileType.upper()} content written to {fileName}\")\n", " except Exception as e:\n", " print(f\"Exception while writing {fileType.upper()} file: {e}\")\n", "\n", "# Set up some lists\n", "nodeFeatureList = []\n", "typeFeatureList = []\n", "dataTypeFeatureList = []\n", "\n", "for featureName, featureData in featureDict.items():\n", " typeFeatureList.append((featureName,featureData.get('type')))\n", " dataTypeFeatureList.append((featureName,featureData.get('datatype')))\n", " for node in featureData.get('freqlist'):\n", " nodeFeatureList.append((node, featureName))\n", " \n", "########################################################### \n", "# Create the page with overview per node type (e.g. word) #\n", "###########################################################\n", " \n", "pageMD=f'{pageTitleMD}\\n# Overview features per nodetype\\n'\n", "pageHTML=f'{htmlStyle}

{pageTitleHTML}

\\n

Overview features per nodetype

'\n", "\n", "# Sort the list alphabetically based on the second item of each tuple (featureName)\n", "nodeFeatureList = sorted(nodeFeatureList, key=lambda x: x[1])\n", "# Iterate over node types\n", "for NodeType in F.otype.all:\n", " NodeItemTextMD=f'## {NodeType}\\n\\nFeature|Featuretype|Datatype|Description|Examples\\n---|---|---|---|---\\n' \n", " NodeItemTextHTML=f'

{NodeType}

\\n\\n' \n", " for node, feature in nodeFeatureList:\n", " if node == NodeType: \n", " featureData=featureDict[feature]\n", " featureDescription=featureData.get('descr') \n", " featureType=featureData.get('type') \n", " featureDataType=featureData.get('datatype')\n", " NodeItemTextMD+=f\"[`{feature}`]({feature}.md#readme)|[`{featureType}`](featurebytype.md#{featureType})|[`{featureDataType}`](featurebydatatype.md#{featureDataType})|{featureDescription}|{exampleData(feature)}\\n\"\n", " NodeItemTextHTML+=f\"\\n\"\n", " NodeItemTextHTML+=f\"
FeatureFeaturetypeDatatypeDescriptionExamples
{feature}{featureType}{featureDataType}{featureDescription}{exampleData(feature)}
\\n\"\n", " pageHTML+=NodeItemTextHTML\n", " pageMD+=NodeItemTextMD\n", " \n", "pageHTML+=f'{footerHTML}'\n", "pageMD+=f'{footerMD}'\n", " \n", "# Write to file by calling common function\n", "if typeOutput in ('md','both'):\n", " fileNameMD = os.path.join(resultLocation, \"featurebynodetype.md\")\n", " writeToFile(fileNameMD, pageMD, 'md', verbose)\n", "\n", "if typeOutput in ('html','both'):\n", " fileNameHTML = os.path.join(resultLocation, \"featurebynodetype.htm\")\n", " writeToFile(fileNameHTML, pageHTML, 'html', verbose)\n", "\n", "####################################################################\n", "# Create the page with overview per data type (string or integer) #\n", "####################################################################\n", "\n", "pageMD=f'{pageTitleMD}\\n# Overview features per datatype\\n'\n", "pageHTML=f'{htmlStyle}

{pageTitleHTML}

\\n

Overview features per datatype'\n", "\n", "# Sort the list alphabetically based on the second item of each tuple (featureName)\n", "dataTypeFeatureList = sorted(dataTypeFeatureList, key=lambda x: x[1])\n", "\n", "DataItemTextMD=DataItemTextHTML=''\n", "for DataType in ('Integer','String'):\n", " DataItemTextMD=f'## {DataType}\\n\\nFeature|Featuretype|Available on nodes|Description|Examples\\n---|---|---|---|---\\n' \n", " DataItemTextHTML=f'

{DataType}

\\n\\n' \n", " for feature, featureDataType in dataTypeFeatureList: \n", " if featureDataType == DataType: \n", " featureDescription=featureDict[feature].get('descr') \n", " featureType=featureDict[feature].get('type') \n", " nodeListMD=nodeListHTML=''\n", " for thisNode in featureDict[feature]['freqlist']:\n", " nodeListMD+=f'[`{thisNode}`](featurebynodetype.md#{thisNode}) '\n", " nodeListHTML+=f'{thisNode} '\n", " DataItemTextMD+=f\"[`{feature}`]({feature}.md#readme)|[`{featureType}`](featurebytype.md#{featureType.lower()})|{nodeListMD}|{featureDescription}|{exampleData(feature)}\\n\"\n", " DataItemTextHTML+=f\"\\n\"\n", " DataItemTextHTML+=f\"
FeatureFeaturetypeAvailable on nodesDescriptionExamples
{feature}{featureType}{nodeListHTML}{featureDescription}{exampleData(feature)}
\\n\"\n", " pageMD+=DataItemTextMD\n", " pageHTML+=DataItemTextHTML\n", "\n", "pageHTML+=f'{footerHTML}'\n", "pageMD+=f'{footerMD}'\n", " \n", " \n", "# Write to file by calling common function\n", "if typeOutput in ('md','both'):\n", " fileNameMD = os.path.join(resultLocation, \"featurebydatatype.md\")\n", " writeToFile(fileNameMD, pageMD, 'md', verbose)\n", "\n", "if typeOutput in ('html','both'):\n", " fileNameHTML = os.path.join(resultLocation, \"featurebydatatype.htm\")\n", " writeToFile(fileNameHTML, pageHTML, 'html', verbose)\n", " \n", "##################################################################\n", "# Create the page with overview per feature type (edge or node) #\n", "##################################################################\n", "\n", "pageMD=f'{pageTitleMD}\\n# Overview features per type\\n'\n", "pageHTML=f'{htmlStyle}

{pageTitleHTML}

\\n

Overview features per type'\n", "\n", "# Sort the list alphabetically based on the second item of each tuple (nodetype)\n", "typeFeatureList = sorted(typeFeatureList, key=lambda x: x[1])\n", "for featureType in ('Node','Edge'):\n", " ItemTextMD=f'## {featureType}\\n\\nFeature|Datatype|Available on nodes|Description|Examples\\n---|---|---|---|---\\n' \n", " ItemTextHTML=f'

{featureType}

\\n\\n' \n", " for thisFeature, thisFeatureType in typeFeatureList: \n", " if featureType == thisFeatureType:\n", " featureDescription=featureDict[thisFeature].get('descr')\n", " featureDataType=featureDict[thisFeature].get('datatype')\n", " nodeListMD=nodeListHTML=''\n", " for thisNode in featureDict[thisFeature]['freqlist']:\n", " nodeListMD+=f'[`{thisNode}`](featurebynodetype.md#{thisNode}) '\n", " nodeListHTML+=f'{thisNode} '\n", " ItemTextMD+=f\"[`{thisFeature}`]({thisFeature}.md#readme)|[`{featureDataType}`](featurebydatatype.md#{featureDataType.lower()})|{nodeListMD}|{featureDescription}|{exampleData(thisFeature)}\\n\"\n", " ItemTextHTML+=f\"\\n\"\n", " ItemTextHTML+=f\"
FeatureDatatypeAvailable on nodesDescriptionExamples
{thisFeature}{featureDataType}{nodeListHTML}{featureDescription}{exampleData(thisFeature)}
\\n\"\n", " pageMD+=ItemTextMD\n", " pageHTML+=ItemTextHTML\n", "\n", "pageHTML+=f'{footerHTML}'\n", "pageMD+=f'{footerMD}'\n", "\n", "# Write to file by calling common function\n", "if typeOutput in ('md','both'):\n", " fileNameMD = os.path.join(resultLocation, \"featurebytype.md\")\n", " writeToFile(fileNameMD, pageMD, 'md', verbose)\n", "\n", "if typeOutput in ('html','both'):\n", " fileNameHTML = os.path.join(resultLocation, \"featurebytype.htm\")\n", " writeToFile(fileNameHTML, pageHTML, 'html', verbose)\n", " \n", "\n", "# Reporting the number of files created\n", "if filesCreated != 0:\n", " print(f'Finished (written {filesCreated} {\"html and md\" if typeOutput == \"both\" else typeOutput} files to directory {pathFull + resultLocation})')\n", "else:\n", " print('No files written')" ] }, { "cell_type": "markdown", "id": "8cad157a-f173-4e6e-8d53-42b6a4437537", "metadata": {}, "source": [ "# 6 - License\n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "markdown", "id": "1096c784-46e4-4158-aaf6-5c15310cee67", "metadata": {}, "source": [ "Licenced under [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://github.com/tonyjurg/Doc4TF/blob/main/LICENCE.md)" ] } ], "metadata": { "celltoolbar": "Geen", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }