{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "# Tutorial\n", "\n", "This notebook gets you started with using\n", "[Text-Fabric](https://annotation.github.io/text-fabric/) for coding in the Quran.\n", "\n", "Familiarity with the underlying\n", "[data model](https://annotation.github.io/text-fabric/tf/about/datamodel.html)\n", "is recommended." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installing Text-Fabric\n", "\n", "### Python\n", "\n", "You need to have Python on your system. Most systems have it out of the box,\n", "but alas, that is python2 and we need at least python **3.6**.\n", "\n", "Install it from [python.org](https://www.python.org) or from\n", "[Anaconda](https://www.anaconda.com/download).\n", "\n", "### TF itself\n", "\n", "```\n", "pip3 install text-fabric\n", "```\n", "\n", "### Jupyter notebook\n", "\n", "You need [Jupyter](http://jupyter.org).\n", "\n", "If it is not already installed:\n", "\n", "```\n", "pip3 install jupyter\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tip\n", "If you start computing with this tutorial, first copy its parent directory to somewhere else,\n", "outside your `syrnt` directory.\n", "If you pull changes from the `syrnt` repository later, your work will not be overwritten.\n", "Where you put your tutorial directory is up till you.\n", "It will work from any directory." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2018-10-18T10:40:34.922214Z", "start_time": "2018-10-18T10:40:34.901689Z" } }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-10-18T10:40:36.468086Z", "start_time": "2018-10-18T10:40:36.442023Z" } }, "outputs": [], "source": [ "import os\n", "import collections" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2018-10-18T10:40:37.964346Z", "start_time": "2018-10-18T10:40:37.346091Z" } }, "outputs": [], "source": [ "from tf.app import use" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quran data\n", "\n", "Text-Fabric will fetch a standard set of features for you from the newest GitHub release binaries.\n", "\n", "The data will be stored in the `text-fabric-data` in your home directory." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Load Features\n", "The data of the corpus is organized in features.\n", "They are *columns* of data.\n", "Think of the text as a gigantic spreadsheet, where row 1 corresponds to the\n", "first word, row 2 to the second word, and so on, for all 100,000+ words.\n", "\n", "The letters of each word is a column `form` in that spreadsheet.\n", "\n", "The corpus contains ca. 30 columns, not only for the words, but also for\n", "textual objects, such as *suras*, *ayas*, and *word groups*.\n", "\n", "Instead of putting that information in one big table, the data is organized in separate columns.\n", "We call those columns **features**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the very last version, use `hot`.\n", "\n", "For the latest release, use `latest`.\n", "\n", "If you have cloned the repos (TF app and data), use `clone`.\n", "\n", "If you do not want/need to upgrade, leave out the checkout specifiers." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2018-10-18T10:40:42.958397Z", "start_time": "2018-10-18T10:40:41.535490Z" } }, "outputs": [ { "data": { "text/html": [ "TF-app: ~/text-fabric-data/q-ran/quran/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/q-ran/quran/tf/0.4" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "This is Text-Fabric 9.2.3\n", "Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html\n", "\n", "40 features found and 0 ignored\n" ] }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 9.2.3, q-ran/quran/app v3, Search Reference
Data: QURAN, Character table, Feature docs
Features:
\n", "
Quran\n", "
\n", "\n", "
\n", "
\n", "a\n", "
\n", "
str
\n", "
\n", " not yet understood\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "ascii\n", "
\n", "
str
\n", "
\n", " transliterated text of word\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "ax\n", "
\n", "
str
\n", "
\n", " not yet understood\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "case\n", "
\n", "
str
\n", "
\n", " case of word\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "component\n", "
\n", "
str
\n", "
\n", " role of the word in its word group (prefix, main, or suffix)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "definite\n", "
\n", "
int
\n", "
\n", " whether the word is definite\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "f\n", "
\n", "
str
\n", "
\n", " not yet understood\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "formation\n", "
\n", "
str
\n", "
\n", " stem formation of verb\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "fx\n", "
\n", "
str
\n", "
\n", " not yet understood\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "
\n", " gender of word (masculine, feminine)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "interjection\n", "
\n", "
str
\n", "
\n", " kind of interjection\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "l\n", "
\n", "
str
\n", "
\n", " not yet understood\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lemma\n", "
\n", "
str
\n", "
\n", " lemma of word\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lx\n", "
\n", "
str
\n", "
\n", " not yet understood\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "mood\n", "
\n", "
str
\n", "
\n", " mood of a verb (subj, jus, ...)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "n\n", "
\n", "
str
\n", "
\n", " not yet understood\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "name\n", "
\n", "
str
\n", "
\n", " Name of sura in Arabic\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
language:
\n", "
arabic
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "name@ll\n", "
\n", "
str
\n", "
\n", " Name of sura in English\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
language:
\n", "
english
\n", "
\n", "\n", "
\n", "
languageCode:
\n", "
en
\n", "
\n", "\n", "
\n", "
languageEnglish:
\n", "
English
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nameAscii\n", "
\n", "
str
\n", "
\n", " Name of sura in Arabic, transliterated\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
language:
\n", "
arabic
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nameTrans\n", "
\n", "
str
\n", "
\n", " Name of sura in Arabic, transcribed\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
language:
\n", "
arabic
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "
\n", " number of word (singular, dual, plural)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:55Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "
\n", " Number of sura, aya, word group, or word\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "order\n", "
\n", "
int
\n", "
\n", " ordinal number of sura\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "
\n", " Quran: plain text plus morphological annotations at the word level\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "pos\n", "
\n", "
str
\n", "
\n", " part-of-speech of word, main class\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
documentation:
\n", "
http://corpus.quran.com/documentation/tagset.jsp
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "posx\n", "
\n", "
str
\n", "
\n", " part-of-speech of word, refined class\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
documentation:
\n", "
http://corpus.quran.com/documentation/tagset.jsp
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "
\n", " person of word (1st, 2nd, 3rd)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "root\n", "
\n", "
str
\n", "
\n", " root of word\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "
\n", " not yet understood\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "space\n", "
\n", "
str
\n", "
\n", " material between this word and the next\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "tense\n", "
\n", "
str
\n", "
\n", " tense of a verb (perfect, imperfect, ...)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "translation@ll\n", "
\n", "
str
\n", "
\n", " english translation of whole aya\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
translator:
\n", "
Arthur Arberry (1955), https://en.wikipedia.org/wiki/Arthur_John_Arberry
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "type\n", "
\n", "
str
\n", "
\n", " type of sura\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "unicode\n", "
\n", "
str
\n", "
\n", " unicode arabic text of word\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:56Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "voice\n", "
\n", "
str
\n", "
\n", " voice of a verb (active, passive)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:57Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "w\n", "
\n", "
str
\n", "
\n", " not yet understood\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:57Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "wx\n", "
\n", "
str
\n", "
\n", " not yet understood\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:57Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "
\n", " Quran: plain text plus morphological annotations at the word level\n", "
\n", "\n", "
\n", "
acronym:
\n", "
quran
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Dirk Roorda and Cornelis van Lit
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Kais Dukes
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2011
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-05-13T07:17:57Z
\n", "
\n", "\n", "
\n", "
license1:
\n", "
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
\n", "
\n", "\n", "
\n", "
license2:
\n", "
Creative Commons BY-ND 3.0 Unported
\n", "
\n", "\n", "
\n", "
source1:
\n", "
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
\n", "
\n", "\n", "
\n", "
source1Url:
\n", "
http://corpus.quran.com
\n", "
\n", "\n", "
\n", "
source2:
\n", "
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
\n", "
\n", "\n", "
\n", "
source2Url:
\n", "
http://tanzil.net/docs/home
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\"q-ran/quran\", hoist=globals())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## API\n", "\n", "At this point it is helpful to throw a quick glance at the text-fabric API documentation\n", "(see the links under **API Members** above).\n", "\n", "The most essential thing for now is that we can use `F` to access the data in the features\n", "we've loaded.\n", "But there is more, such as `N`, which helps us to walk over the text, as we see in a minute." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Counting\n", "\n", "In order to get acquainted with the data, we start with the simple task of counting.\n", "\n", "## Count all nodes\n", "We use the\n", "[`N.walk()` generator](https://annotation.github.io/text-fabric/tf/core/nodes.html#tf.core.nodes.Nodes.walk)\n", "to walk through the nodes.\n", "\n", "We compared corpus to a gigantic spreadsheet, where the rows correspond to the words.\n", "In Text-Fabric, we call the rows `slots`, because they are the textual positions that can be filled with words.\n", "\n", "We also mentioned that there are also more textual objects.\n", "They are the verses, chapters and books.\n", "They also correspond to rows in the big spreadsheet.\n", "\n", "In Text-Fabric we call all these rows *nodes*, and the `N()` generator\n", "carries us through those nodes in the textual order.\n", "\n", "Just one extra thing: the `info` statements generate timed messages.\n", "If you use them instead of `print` you'll get a sense of the amount of time that\n", "the various processing steps typically need." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:13:01.501437Z", "start_time": "2018-03-08T10:13:01.452315Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Counting nodes ...\n", " 0.03s 218282 nodes\n" ] } ], "source": [ "A.indent(reset=True)\n", "A.info(\"Counting nodes ...\")\n", "\n", "i = 0\n", "for n in N.walk():\n", " i += 1\n", "\n", "A.info(\"{} nodes\".format(i))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What are those nodes?\n", "Every node has a type, like word, or aya, or sura.\n", "We know that we have approximately 100,000 words and a few other nodes.\n", "But what exactly are they?\n", "\n", "Text-Fabric has two special features, `otype` and `oslots`, that must occur in every Text-Fabric data set.\n", "`otype` tells you for each node its type, and you can ask for the number of `slot`s in the text.\n", "\n", "Here we go!" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:13:05.040545Z", "start_time": "2018-03-08T10:13:05.019791Z" } }, "outputs": [ { "data": { "text/plain": [ "'word'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.otype.slotType" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:13:05.855607Z", "start_time": "2018-03-08T10:13:05.849069Z" } }, "outputs": [ { "data": { "text/plain": [ "128219" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.otype.maxSlot" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:13:06.387787Z", "start_time": "2018-03-08T10:13:06.381228Z" } }, "outputs": [ { "data": { "text/plain": [ "218282" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.otype.maxNode" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:13:07.770989Z", "start_time": "2018-03-08T10:13:07.764917Z" } }, "outputs": [ { "data": { "text/plain": [ "('manzil',\n", " 'sajda',\n", " 'juz',\n", " 'sura',\n", " 'hizb',\n", " 'ruku',\n", " 'page',\n", " 'aya',\n", " 'lex',\n", " 'group',\n", " 'word')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.otype.all" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:13:11.588722Z", "start_time": "2018-03-08T10:13:11.581305Z" } }, "outputs": [ { "data": { "text/plain": [ "(('manzil', 18317.0, 216987, 216993),\n", " ('sajda', 6043.066666666667, 218154, 218168),\n", " ('juz', 4273.966666666666, 212125, 212154),\n", " ('sura', 1124.7280701754387, 218169, 218282),\n", " ('hizb', 534.2458333333333, 211885, 212124),\n", " ('ruku', 230.60971223021582, 217598, 218153),\n", " ('page', 212.28311258278146, 216994, 217597),\n", " ('aya', 20.56109685695959, 128220, 134455),\n", " ('lex', 15.440397350993377, 212155, 216986),\n", " ('group', 1.6559557788425525, 134456, 211884),\n", " ('word', 1, 1, 128219))" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C.levels.data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is interesting: above you see all the textual objects, with the average size of their objects,\n", "the node where they start, and the node where they end." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Count individual object types\n", "This is an intuitive way to count the number of nodes in each type.\n", "Note in passing, how we use the `indent` in conjunction with `info` to produce neat timed\n", "and indented progress messages." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:13:21.547051Z", "start_time": "2018-03-08T10:13:21.498807Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s counting objects ...\n", " | 0.00s 7 manzils\n", " | 0.00s 15 sajdas\n", " | 0.00s 30 juzs\n", " | 0.00s 114 suras\n", " | 0.00s 240 hizbs\n", " | 0.00s 556 rukus\n", " | 0.00s 604 pages\n", " | 0.00s 6236 ayas\n", " | 0.00s 4832 lexs\n", " | 0.01s 77429 groups\n", " | 0.01s 128219 words\n", " 0.03s Done\n" ] } ], "source": [ "A.indent(reset=True)\n", "A.info(\"counting objects ...\")\n", "\n", "for otype in F.otype.all:\n", " i = 0\n", " A.indent(level=1, reset=True)\n", "\n", " for n in F.otype.s(otype):\n", " i += 1\n", "\n", " A.info(\"{:>7} {}s\".format(i, otype))\n", "\n", "A.indent(level=0)\n", "A.info(\"Done\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Viewing textual objects\n", "\n", "We use the A API (the extra power) to peek into the corpus." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's inspect some words." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:02.282178Z", "start_time": "2018-05-18T09:18:02.274117Z" } }, "outputs": [ { "data": { "text/html": [ "
ى
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "wordShow = (1000, 10000, 100000)\n", "for word in wordShow:\n", " A.pretty(word)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature statistics\n", "\n", "`F`\n", "gives access to all features.\n", "Every feature has a method\n", "`freqList()`\n", "to generate a frequency list of its values, higher frequencies first.\n", "Here are the parts of speech:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:13:28.926742Z", "start_time": "2018-03-08T10:13:28.846500Z" } }, "outputs": [ { "data": { "text/plain": [ "(('pronoun', 29319),\n", " ('noun', 29049),\n", " ('verb', 19356),\n", " ('particle', 13511),\n", " ('preposition', 13006),\n", " ('conjunction', 10134),\n", " ('determiner', 8377),\n", " ('adjective', 1961),\n", " ('adverb', 1835),\n", " ('prefix', 1641),\n", " ('initials', 30))" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.pos.freqList()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Lexeme matters\n", "\n", "## Top 10 frequent verbs\n", "\n", "If we count the frequency of words, we usually mean the frequency of their\n", "corresponding roots or lexemes.\n", "\n", "Let's start with roots." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:13:39.960696Z", "start_time": "2018-03-08T10:13:39.829670Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Collecting data\n", " 0.05s Done\n", "qwl: 1620\n", "kwn: 1358\n", "Amn: 558\n", "Aty: 535\n", "Elm: 425\n", "jEl: 340\n", "rAy: 315\n", "kfr: 304\n", "jyA: 278\n", "Eml: 276\n", "\n" ] } ], "source": [ "verbs = collections.Counter()\n", "A.indent(reset=True)\n", "A.info(\"Collecting data\")\n", "\n", "for w in F.otype.s(\"word\"):\n", " if F.pos.v(w) != \"verb\":\n", " continue\n", " verbs[F.root.v(w)] += 1\n", "\n", "A.info(\"Done\")\n", "print(\n", " \"\".join(\n", " \"{}: {}\\n\".format(verb, cnt)\n", " for (verb, cnt) in sorted(verbs.items(), key=lambda x: (-x[1], x[0]))[0:10]\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now the same with lexemes.\n", "There are several methods for working with lexemes.\n", "\n", "### Method 1: counting words" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:13:39.960696Z", "start_time": "2018-03-08T10:13:39.829670Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Collecting data\n", " 0.05s Done\n", "qaAla: 1618\n", "kaAna: 1358\n", "'aAmana: 537\n", "Ealima: 382\n", "jaEala: 340\n", "kafara: 289\n", "jaA^'a: 278\n", "Eamila: 276\n", "A^taY: 271\n", "ra'aA: 271\n", "\n" ] } ], "source": [ "verbs = collections.Counter()\n", "A.indent(reset=True)\n", "A.info(\"Collecting data\")\n", "\n", "for w in F.otype.s(\"word\"):\n", " if F.pos.v(w) != \"verb\":\n", " continue\n", " verbs[F.lemma.v(w)] += 1\n", "\n", "A.info(\"Done\")\n", "print(\n", " \"\".join(\n", " \"{}: {}\\n\".format(verb, cnt)\n", " for (verb, cnt) in sorted(verbs.items(), key=lambda x: (-x[1], x[0]))[0:10]\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lexeme distribution\n", "\n", "Let's do a bit more fancy lexeme stuff.\n", "\n", "### Hapaxes\n", "\n", "A hapax can be found by inspecting lexemes and see to how many word nodes they are linked.\n", "If that is number is one, we have a hapax.\n", "\n", "We print 10 hapaxes with their gloss." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:13:58.376059Z", "start_time": "2018-03-08T10:13:58.247752Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.05s 1994 hapaxes found\n", "\t$aAkilat\n", "\t$aAni}\n", "\t$aAriko\n", "\t$aAwiro\n", "\t$aTo_#\n", "\t$a`Ti}\n", "\t$a`mixa`t\n", "\t$a`xiSap\n", "\t$afatayon\n", "\t$agafa\n" ] } ], "source": [ "A.indent(reset=True)\n", "\n", "hapax = []\n", "lexIndex = collections.defaultdict(list)\n", "\n", "for n in F.otype.s(\"word\"):\n", " lexIndex[F.lemma.v(n)].append(n)\n", "\n", "hapax = dict((lex, occs) for (lex, occs) in lexIndex.items() if len(occs) == 1)\n", "\n", "A.info(\"{} hapaxes found\".format(len(hapax)))\n", "\n", "for h in sorted(hapax)[0:10]:\n", " print(f\"\\t{h}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want more info on the hapaxes, we get that by means of its *node*.\n", "The `lexIndex` dictionary stores the occurrences of a lexeme as a list of nodes.\n", "\n", "Let's get the part of speech and the Arabic form of those 10 hapaxes." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:00.590376Z", "start_time": "2018-03-08T10:14:00.580157Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\tnoun شَاكِلَتِ\n", "\tnoun شَانِئَ\n", "\tverb شَارِكْ\n", "\tverb شَاوِرْ\n", "\tnoun شَطْـَٔ\n", "\tnoun شَٰطِئِ\n", "\tadjective شَٰمِخَٰتٍ\n", "\tnoun شَٰخِصَةٌ\n", "\tnoun شَفَتَيْنِ\n", "\tverb شَغَفَ\n" ] } ], "source": [ "for h in sorted(hapax)[0:10]:\n", " node = hapax[h][0]\n", " print(f\"\\t{F.pos.v(node):<12} {F.unicode.v(node)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Small occurrence base\n", "\n", "The occurrence base of a lexeme are the suras in which it occurs.\n", "Let's look for lexemes that occur in a single sura.\n", "\n", "Oh yes, we have already found the hapaxes, we will skip them here." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:03.503420Z", "start_time": "2018-03-08T10:14:02.841707Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Finding single sura lexemes\n", " 0.61s 2228 single sura lexemes found\n", "=====================================\n", ">aZolama (1x) first 2:20 last 2:20\n", "Ha*ar (2x) first 2:19 last 2:243\n", "Say~ib (1x) first 2:19 last 2:19\n", "baEuwDap (1x) first 2:26 last 2:26\n", "magoDuwb (1x) first 1:7 last 1:7\n", "nuqad~isu (1x) first 2:30 last 2:30\n", "rabiHat (1x) first 2:16 last 2:16\n", "vamarap (1x) first 2:25 last 2:25\n", "yasofiku (2x) first 2:30 last 2:84\n", "{sotawoqada (1x) first 2:17 last 2:17\n", "=====================================\n", "$aTor (5x) first 2:144 last 2:150\n", "Ha*ar (2x) first 2:19 last 2:243\n", "Hayov2 (2x) first 2:144 last 2:150\n", "Hur~ (2x) first 2:178 last 2:178\n", "Sibogap (2x) first 2:138 last 2:138\n", "baqarap (4x) first 2:67 last 2:71\n", "huwd2 (3x) first 2:111 last 2:140\n", "taTaw~aEa (2x) first 2:158 last 2:184\n", "yasofiku (2x) first 2:30 last 2:84\n", "yataEal~amu (2x) first 2:102 last 2:102\n" ] } ], "source": [ "A.indent(reset=True)\n", "A.info(\"Finding single sura lexemes\")\n", "\n", "lexSuraIndex = {}\n", "\n", "for (lex, occs) in lexIndex.items():\n", " lexSuraIndex[lex] = set(L.u(n, otype=\"sura\")[0] for n in occs)\n", "\n", "singleSura = [\n", " (lex, occs)\n", " for (lex, occs) in lexIndex.items()\n", " if len(lexSuraIndex.get(lex, [])) == 1\n", "]\n", "singleSuraWithoutHapax = [(lex, occs) for (lex, occs) in singleSura if len(occs) != 1]\n", "\n", "A.info(\"{} single sura lexemes found\".format(len(singleSura)))\n", "\n", "for data in (singleSura, singleSuraWithoutHapax):\n", " print(\"=====================================\")\n", " for (lex, occs) in sorted(data[0:10]):\n", " print(\n", " \"{:<15} ({}x) first {:>5} last {:>5}\".format(\n", " lex,\n", " len(occs),\n", " \"{}:{}\".format(*T.sectionFromNode(occs[0])),\n", " \"{}:{}\".format(*T.sectionFromNode(occs[-1])),\n", " )\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Confined to suras\n", "\n", "As a final exercise with lexemes, lets make a list of all suras, and show their total number of lexemes and\n", "the number of lexemes that occur exclusively in that sura." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:07.670486Z", "start_time": "2018-03-08T10:14:07.479785Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Making sura-lexeme index\n", " 0.08s Found 4833 lexemes\n" ] } ], "source": [ "A.indent(reset=True)\n", "A.info(\"Making sura-lexeme index\")\n", "\n", "allSura = collections.defaultdict(set)\n", "allLex = set()\n", "\n", "for s in F.otype.s(\"sura\"):\n", " for w in L.d(s, \"word\"):\n", " ln = F.lemma.v(w)\n", " allSura[s].add(ln)\n", " allLex.add(ln)\n", "\n", "A.info(\"Found {} lexemes\".format(len(allLex)))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:09.712557Z", "start_time": "2018-03-08T10:14:09.068800Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Finding single sura lexemes\n", " 0.60s found 2228 single sura lexemes\n" ] } ], "source": [ "A.indent(reset=True)\n", "A.info(\"Finding single sura lexemes\")\n", "\n", "lexSuraIndex = {}\n", "\n", "for (lex, occs) in lexIndex.items():\n", " lexSuraIndex[lex] = set(L.u(n, otype=\"sura\")[0] for n in occs)\n", "\n", "singleSuraLex = collections.defaultdict(set)\n", "for (lex, suras) in lexSuraIndex.items():\n", " if len(suras) == 1:\n", " singleSuraLex[list(suras)[0]].add(lex)\n", "\n", "singleSura = {sura: len(lexs) for (sura, lexs) in singleSuraLex.items()}\n", "\n", "A.info(\"found {} single sura lexemes\".format(sum(singleSura.values())))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:10.997607Z", "start_time": "2018-03-08T10:14:10.964980Z" }, "lines_to_end_of_cell_marker": 2 }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sura name sura #all #own %own\n", "---------------------------------------------------\n", "Abundance 108 9 4 44.4%\n", "Quraysh 106 16 5 31.2%\n", "The Dawn 113 17 5 29.4%\n", "The Chargers 100 32 9 28.1%\n", "Sincerity 112 9 2 22.2%\n", "The Traducer 104 28 6 21.4%\n", "The Palm Fibre 111 21 4 19.0%\n", "The Overwhelming 88 69 13 18.8%\n", "The Beneficent 55 142 26 18.3%\n", "The Overthrowing 81 77 14 18.2%\n", "The Morning Star 86 44 8 18.2%\n", "The Elephant 105 22 4 18.2%\n", "The Sun 91 45 8 17.8%\n", "Defrauding 83 96 17 17.7%\n", "The Inevitable 56 206 36 17.5%\n", "The City 90 63 11 17.5%\n", "The Calamity 101 24 4 16.7%\n", "Those who drag forth 79 127 21 16.5%\n", "He frowned 80 103 17 16.5%\n", "The Resurrection 75 104 17 16.3%\n", "The Morning Hours 93 31 5 16.1%\n", "The Dawn 89 94 15 16.0%\n", "The Emissaries 77 108 17 15.7%\n", "Mary 19 360 52 14.4%\n", "The Reality 69 157 22 14.0%\n", "The Repentance 9 638 89 13.9%\n", "The Cave 18 552 71 12.9%\n", "The Star 53 188 24 12.8%\n", "The Cow 2 1137 145 12.8%\n", "Joseph 12 512 65 12.7%\n", "Mankind 114 16 2 12.5%\n", "The Pen 68 171 21 12.3%\n", "The Cloaked One 74 155 19 12.3%\n", "The Moon 54 188 22 11.7%\n", "The Enshrouded One 73 129 14 10.9%\n", "The Announcement 78 122 13 10.7%\n", "The Splitting Open 84 76 8 10.5%\n", "Taa-Haa 20 483 50 10.4%\n", "The Light 24 416 43 10.3%\n", "Those drawn up in Ranks 37 360 37 10.3%\n", "Noah 71 128 13 10.2%\n", "The Table 5 685 69 10.1%\n", "The letter Saad 38 338 34 10.1%\n", "Competition 102 20 2 10.0%\n", "The Night Journey 17 533 53 9.9%\n", "The Clans 33 454 43 9.5%\n", "The Women 4 810 75 9.3%\n", "The Pilgrimage 22 486 44 9.1%\n", "The Fig 95 34 3 8.8%\n", "Muhammad 47 239 21 8.8%\n", "The letter Qaaf 50 207 18 8.7%\n", "The Jinn 72 139 12 8.6%\n", "The Prophets 21 423 35 8.3%\n", "The Clot 96 50 4 8.0%\n", "Sheba 34 330 26 7.9%\n", "The Cattle 6 725 57 7.9%\n", "The Family of Imraan 3 761 59 7.8%\n", "The Spoils of War 8 400 31 7.8%\n", "The Ascending Stairways 70 142 11 7.7%\n", "Man 76 155 12 7.7%\n", "The Winnowing Winds 51 194 15 7.7%\n", "The Stories 28 468 36 7.7%\n", "The Mount 52 182 14 7.7%\n", "The Inner Apartments 49 160 12 7.5%\n", "The Poets 26 410 30 7.3%\n", "Hud 11 554 40 7.2%\n", "The Declining Day, Epoch 103 14 1 7.1%\n", "The Bee 16 552 39 7.1%\n", "The Exile 59 213 15 7.0%\n", "The Night 92 57 4 7.0%\n", "The Ant 27 414 29 7.0%\n", "The Rock 15 289 20 6.9%\n", "The Heights 7 819 51 6.2%\n", "The Criterion 25 372 23 6.2%\n", "The Pleading Woman 58 198 12 6.1%\n", "Ornaments of gold 43 334 20 6.0%\n", "The Victory 48 253 15 5.9%\n", "The Cleaving 82 54 3 5.6%\n", "Divorce 65 148 8 5.4%\n", "Abraham 14 334 18 5.4%\n", "The Iron 57 249 13 5.2%\n", "The Thunder 13 348 18 5.2%\n", "The Believers 23 392 20 5.1%\n", "The Evidence 98 59 3 5.1%\n", "The Most High 87 61 3 4.9%\n", "The Romans 30 290 14 4.8%\n", "Almsgiving 107 21 1 4.8%\n", "Yaseen 36 298 14 4.7%\n", "The Sovereignty 67 171 8 4.7%\n", "The Consolation 94 22 1 4.5%\n", "Luqman 31 248 11 4.4%\n", "The Smoke 44 183 8 4.4%\n", "The Power, Fate 97 23 1 4.3%\n", "The Originator 35 335 14 4.2%\n", "The Prohibition 66 144 6 4.2%\n", "The Opening 1 24 1 4.2%\n", "The Constellations 85 77 3 3.9%\n", "Explained in detail 41 311 12 3.9%\n", "The Forgiver 40 398 15 3.8%\n", "The Earthquake 99 28 1 3.6%\n", "The Groups 39 393 14 3.6%\n", "She that is to be examined 60 158 5 3.2%\n", "The Hypocrites 63 103 3 2.9%\n", "Jonas 10 486 13 2.7%\n", "Consultation 42 304 8 2.6%\n", "The Dunes 46 275 7 2.5%\n", "Crouching 45 201 5 2.5%\n", "The Ranks 61 124 3 2.4%\n", "The Spider 29 336 7 2.1%\n", "The Prostration 32 193 2 1.0%\n", "Friday 62 104 1 1.0%\n", "Mutual Disillusion 64 138 1 0.7%\n", "Divine Support 110 19 0 0.0%\n", "The Disbelievers 109 9 0 0.0%\n" ] } ], "source": [ "print(\n", " \"{:<30} {:>4} {:>4} {:>4} {:>5}\\n{}\".format(\n", " \"sura name\",\n", " \"sura\",\n", " \"#all\",\n", " \"#own\",\n", " \"%own\",\n", " \"-\" * 51,\n", " )\n", ")\n", "suraList = []\n", "\n", "for s in F.otype.s(\"sura\"):\n", " suraName = Fs(\"name@en\").v(s)\n", " sura = T.suraName(s)\n", " a = len(allSura[s])\n", " o = singleSura.get(s, 0)\n", " p = 100 * o / a\n", " suraList.append((suraName, sura, a, o, p))\n", "\n", "for x in sorted(suraList, key=lambda e: (-e[4], -e[2], e[1])):\n", " print(\"{:<30} {:>4} {:>4} {:>4} {:>4.1f}%\".format(*x))" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 2 }, "source": [ "## For all section types\n", "\n", "What we did for suras, we can also do for the other section types.\n", "\n", "We generalize the task into a function, that accepts the kind of section as parameter.\n", "Then we can call that function for all our section types." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "def lexBase(section):\n", " # make indices\n", " lexemesPerSection = {}\n", " sectionsPerLexeme = {}\n", " for s in F.otype.s(section):\n", " for w in L.d(s, otype=\"word\"):\n", " lex = F.lemma.v(w)\n", " lexemesPerSection.setdefault(s, set()).add(lex)\n", " sectionsPerLexeme.setdefault(lex, set()).add(s)\n", "\n", " print(\n", " \"{:<10} {:>4} {:>4} {:>5}\\n{}\".format(\n", " section,\n", " \"#all\",\n", " \"#own\",\n", " \"%own\",\n", " \"-\" * 26,\n", " )\n", " )\n", " sectionList = []\n", "\n", " for s in F.otype.s(section):\n", " n = F.number.v(s)\n", " myLexes = lexemesPerSection[s]\n", " a = len(myLexes)\n", " o = len([lex for lex in myLexes if len(sectionsPerLexeme[lex]) == 1])\n", " p = 100 * o / a\n", " sectionList.append((n, a, o, p))\n", "\n", " for x in sorted(sectionList, key=lambda e: (-e[3], -e[1], e[0])):\n", " print(\"{:<10} {:>4} {:>4} {:>4.1f}%\".format(*x))\n", " print(\"=\" * 26)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "manzil #all #own %own\n", "--------------------------\n", "7 2120 685 32.3%\n", "4 1907 415 21.8%\n", "1 1694 302 17.8%\n", "2 1773 316 17.8%\n", "5 1580 235 14.9%\n", "3 1493 222 14.9%\n", "6 1516 215 14.2%\n", "==========================\n" ] } ], "source": [ "for section in (\n", " \"manzil\",\n", " # 'sajda',\n", " # 'juz',\n", " # 'ruku',\n", " # 'hizb',\n", " # 'page',\n", "):\n", " lexBase(section)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Layer API\n", "We travel upwards and downwards, forwards and backwards through the nodes.\n", "The Layer-API (`L`) provides functions: `u()` for going up, and `d()` for going down,\n", "`n()` for going to next nodes and `p()` for going to previous nodes.\n", "\n", "These directions are indirect notions: nodes are just numbers, but by means of the\n", "`oslots` feature they are linked to slots. One node *contains* an other node, if the one is linked to a set of slots that contains the set of slots that the other is linked to.\n", "And one if next or previous to an other, if its slots follow of precede the slots of the other one.\n", "\n", "`L.u(node)` **Up** is going to nodes that embed `node`.\n", "\n", "`L.d(node)` **Down** is the opposite direction, to those that are contained in `node`.\n", "\n", "`L.n(node)` **Next** are the next *adjacent* nodes, i.e. nodes whose first slot comes immediately after the last slot of `node`.\n", "\n", "`L.p(node)` **Previous** are the previous *adjacent* nodes, i.e. nodes whose last slot comes immediately before the first slot of `node`.\n", "\n", "All these functions yield nodes of all possible node types.\n", "By passing an optional parameter, you can restrict the results to nodes of that type.\n", "\n", "The result are ordered according to the order of things in the text.\n", "\n", "The functions return always a tuple, even if there is just one node in the result.\n", "\n", "## Going up\n", "We go from the first word to the book it contains.\n", "Note the `[0]` at the end. You expect one book, yet `L` returns a tuple.\n", "To get the only element of that tuple, you need to do that `[0]`.\n", "\n", "If you are like me, you keep forgetting it, and that will lead to weird error messages later on." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:15.350367Z", "start_time": "2018-03-08T10:14:15.343039Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "218169\n" ] } ], "source": [ "firstSura = L.u(1, otype=\"sura\")[0]\n", "print(firstSura)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And let's see all the containing objects of word 3:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:17.364197Z", "start_time": "2018-03-08T10:14:17.352186Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "word 3 is contained in manzil 216987\n", "word 3 is contained in sajda x\n", "word 3 is contained in juz 212125\n", "word 3 is contained in sura 218169\n", "word 3 is contained in hizb 211885\n", "word 3 is contained in ruku 217598\n", "word 3 is contained in page 216994\n", "word 3 is contained in aya 128220\n", "word 3 is contained in lex 212156\n", "word 3 is contained in group 134457\n" ] } ], "source": [ "w = 3\n", "for otype in F.otype.all:\n", " if otype == F.otype.slotType:\n", " continue\n", " up = L.u(w, otype=otype)\n", " upNode = \"x\" if len(up) == 0 else up[0]\n", " print(\"word {} is contained in {} {}\".format(w, otype, upNode))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Going next\n", "Let's go to the next nodes of the first book." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:20.775341Z", "start_time": "2018-03-08T10:14:20.762875Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 49: word first slot=49 , last slot=49 \n", " 134485: group first slot=49 , last slot=49 \n", " 128227: aya first slot=49 , last slot=49 \n", " 216995: page first slot=49 , last slot=112 \n", " 217599: ruku first slot=49 , last slot=149 \n", " 218170: sura first slot=49 , last slot=10291 \n" ] } ], "source": [ "afterFirstSura = L.n(firstSura)\n", "for n in afterFirstSura:\n", " print(\n", " \"{:>7}: {:<13} first slot={:<6}, last slot={:<6}\".format(\n", " n,\n", " F.otype.v(n),\n", " E.oslots.s(n)[0],\n", " E.oslots.s(n)[-1],\n", " )\n", " )\n", "secondSura = L.n(firstSura, otype=\"sura\")[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Going previous\n", "\n", "And let's see what is right before the second book." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:23.300424Z", "start_time": "2018-03-08T10:14:23.292226Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 218169: sura first slot=1 , last slot=48 \n", " 217598: ruku first slot=1 , last slot=48 \n", " 216994: page first slot=1 , last slot=48 \n", " 128226: aya first slot=34 , last slot=48 \n", " 134484: group first slot=47 , last slot=48 \n", " 48: word first slot=48 , last slot=48 \n" ] } ], "source": [ "for n in L.p(secondSura):\n", " print(\n", " \"{:>7}: {:<13} first slot={:<6}, last slot={:<6}\".format(\n", " n,\n", " F.otype.v(n),\n", " E.oslots.s(n)[0],\n", " E.oslots.s(n)[-1],\n", " )\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Going down" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We go to the chapters of the second book, and just count them." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:25.969860Z", "start_time": "2018-03-08T10:14:25.957084Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "286\n" ] } ], "source": [ "ayas = L.d(secondSura, otype=\"aya\")\n", "print(len(ayas))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The first aya\n", "We pick the first aya and the first word, and explore what is above and below them." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:28.468853Z", "start_time": "2018-03-08T10:14:28.416537Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Node 1\n", " | UP\n", " | | 134456 group\n", " | | 128220 aya\n", " | | 216994 page\n", " | | 217598 ruku\n", " | | 218169 sura\n", " | | 211885 hizb\n", " | | 212125 juz\n", " | | 216987 manzil\n", " | DOWN\n", " | | \n", "Node 128220\n", " | UP\n", " | | 216994 page\n", " | | 217598 ruku\n", " | | 218169 sura\n", " | | 211885 hizb\n", " | | 212125 juz\n", " | | 216987 manzil\n", " | DOWN\n", " | | 134456 group\n", " | | 1 word\n", " | | 2 word\n", " | | 134457 group\n", " | | 3 word\n", " | | 134458 group\n", " | | 4 word\n", " | | 5 word\n", " | | 134459 group\n", " | | 6 word\n", " | | 7 word\n", "Done\n" ] } ], "source": [ "for n in [1, L.u(1, otype=\"aya\")[0]]:\n", " A.indent(level=0)\n", " A.info(\"Node {}\".format(n), tm=False)\n", " A.indent(level=1)\n", " A.info(\"UP\", tm=False)\n", " A.indent(level=2)\n", " A.info(\"\\n\".join([\"{:<15} {}\".format(u, F.otype.v(u)) for u in L.u(n)]), tm=False)\n", " A.indent(level=1)\n", " A.info(\"DOWN\", tm=False)\n", " A.indent(level=2)\n", " A.info(\"\\n\".join([\"{:<15} {}\".format(u, F.otype.v(u)) for u in L.d(n)]), tm=False)\n", "A.indent(level=0)\n", "A.info(\"Done\", tm=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Text API\n", "\n", "So far, we have mainly seen nodes and their numbers, and the names of node types.\n", "You would almost forget that we are dealing with text.\n", "So let's try to see some text.\n", "\n", "In the same way as `F` gives access to feature data,\n", "`T` gives access to the text.\n", "That is also feature data, but you can tell Text-Fabric which features are specifically\n", "carrying the text, and in return Text-Fabric offers you\n", "a Text API: `T`.\n", "\n", "## Formats\n", "Arabic text can be represented in a number of ways:\n", "\n", "* in transliteration, or in Arabic characters,\n", "* showing the actual text or only the lexemes, or roots.\n", "\n", "If you wonder where the information about text formats is stored:\n", "not in the program text-fabric, but in the data set.\n", "It has a feature `otext`, which specifies the formats and which features\n", "must be used to produce them. `otext` is the third special feature in a TF data set,\n", "next to `otype` and `oslots`.\n", "It is an optional feature.\n", "If it is absent, there will be no `T` API.\n", "\n", "Here is a list of all available formats in this data set." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:31.965590Z", "start_time": "2018-03-08T10:14:31.955945Z" } }, "outputs": [ { "data": { "text/plain": [ "['lex-trans-full', 'root-trans-full', 'text-orig-full', 'text-trans-full']" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(T.formats)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the formats\n", "\n", "We can pretty display in other formats:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Y
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for word in wordShow:\n", " A.pretty(word, fmt=\"text-trans-full\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's use those formats to print out the first aya of the Quran." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:35.240131Z", "start_time": "2018-03-08T10:14:35.231752Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "lex-trans-full:\n", "\t{som {ll~ah r~aHoma`n r~aHiym\n", "root-trans-full:\n", "\tsmw Alh rHm rHm\n", "text-orig-full:\n", "\tبِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ\n", "text-trans-full:\n", "\tbisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi\n" ] } ], "source": [ "a1 = F.otype.s(\"aya\")[0]\n", "\n", "for fmt in sorted(T.formats):\n", " print(\"{}:\\n\\t{}\".format(fmt, T.text(a1, fmt=fmt, descend=True)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we do not specify a format, the **default** format is used (`text-orig-full`)." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:37.681491Z", "start_time": "2018-03-08T10:14:37.674538Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ\n" ] } ], "source": [ "print(T.text(a1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Whole text in all formats in about a second\n", "Part of the pleasure of working with computers is that they can crunch massive amounts of data.\n", "The text of the Quran Bible is a piece of cake.\n", "\n", "It takes less than a second to have that cake and eat it.\n", "In nearly a handful formats." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "ExecuteTime": { "end_time": "2018-03-08T10:14:40.870244Z", "start_time": "2018-03-08T10:14:39.998071Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s writing plain text of whole Quran in all formats\n", " 0.90s done 4 formats\n", "lex-trans-full\n", "{som {ll~ah r~aHoma`n r~aHiym\n", "Hamod {ll~ah rab~ Ea`lamiyn\n", "r~aHoma`n r~aHiym\n", "ma`lik yawom diyn\n", "anoEamota Ealayohimo gayori {lomagoDuwbi Ealayohimo walaA {lD~aA^l~iyna\n", "Al^m^\n", "*a`lika {lokita`bu laA rayoba fiyhi hudFY l~ilomut~aqiyna\n", "{l~a*iyna yu&ominuwna bi{logayobi wayuqiymuwna {lS~alaw`pa wamim~aA razaqona`humo yunfiquwna\n", "wa{l~a*iyna yu&ominuwna bimaA^ >unzila unzila min qabolika wabi{lo'aAxirapi humo yuwqinuwna\n", ">uw@la`^}ika EalaY` hudFY m~in r~ab~ihimo wa>uw@la`^}ika humu {lomufoliHuwna\n", "an*arotahumo >amo lamo tun*irohumo laA yu&ominuwna\n", "xatama {ll~ahu EalaY` quluwbihimo waEalaY` samoEihimo waEalaY`^ >aboSa`rihimo gi$a`wapN walahumo Ea*aAbN EaZiymN\n", "wamina {ln~aAsi man yaquwlu 'aAman~aA bi{ll~ahi wabi{loyawomi {lo'aAxiri wamaA hum bimu&ominiyna\n", "yuxa`diEuwna {ll~aha wa{l~a*iyna 'aAmanuwA@ wamaA yaxodaEuwna anfusahumo wamaA ya$oEuruwna\n", "fiY quluwbihim m~araDN fazaAdahumu {ll~ahu maraDFA walahumo Ea*aAbN >aliymN[ bimaA kaAnuwA@ yako*ibuwna\n", "waaroDi qaAluw^A@ alaA^ anu&ominu kamaA^ 'aAmana {ls~ufahaA^'u >alaA^