{ "cells": [ { "cell_type": "markdown", "id": "7e52c6a3-aa47-4eaf-b79f-e499e16c19c3", "metadata": {}, "source": [ "# Check feature monad" ] }, { "cell_type": "markdown", "id": "8c835412-8a8f-4ff5-b403-d1e33d7d336d", "metadata": {}, "source": [ "In the context of a corpus, the term \"monad\" refers to a unit of textual analysis or a discrete linguistic unit. Specifically, in this Text-Fabric dataset, it is employed to denote the sequence number of an individual word within the entire corpus, specifically the New Testament.\n", "\n", "Hence, in the tf file the value for monad should nicely increment by one. This script checks this." ] }, { "cell_type": "code", "execution_count": 9, "id": "28d3f866-d3bc-4427-8216-0c04cac7d79d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The file ..//tf//0.4//monad.tf exist and will be checked\n", "Analyzing: .............\n", " Filecontent is OK (i.e. sequential)\n" ] } ], "source": [ "import os\n", "# Following variable should contain the relative path to the 'monad' tf file to check \n", "FileToCheck = \"..//tf//0.4//monad.tf\"\n", "\n", "def process_file(file_path):\n", " LinesToPrint=10\n", " PrintedLines=0\n", " PreviousMonadValue=0\n", " Result =\"Filecontent is OK (i.e. sequential)\"\n", " with open(file_path, \"r\") as file:\n", " print(\"Analyzing: \", end='', flush=True)\n", " for line in file:\n", " if line.startswith(\"@\"):\n", " continue # Skip lines that start with \"@\"\n", " if line.startswith(\"\\n\"):\n", " continue # Skip lines that start with \" \"\n", " CurrentMonadValue=int(line.replace(\"\\n\", \"\"))\n", " if CurrentMonadValue % 10000 == 0:\n", " print(\".\", end='', flush=True) # Print a dot without a new line for every 10000th monad value\n", " if CurrentMonadValue-1==PreviousMonadValue:\n", " PreviousMonadValue=CurrentMonadValue\n", " else:\n", " Result=\"Found something wrong. Monad\"+CurrentMonadValue+\"after\"+PreviousMonadValue\n", " break\n", " print (\"\\n\",Result) \n", " \n", "# Main part\n", "#First check if the file exist, then check its content\n", "if os.path.exists(FileToCheck):\n", " print(f\"The file {FileToCheck} exist and will be checked\")\n", " process_file(FileToCheck)\n", "else:\n", " print(f\"Could not find file {FileToCheck}.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "8ed4bfe2-2124-4c8a-8717-01ef0e650be3", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }