# From SHEBANQ to LAF-Fabric

Maybe you arrived here because you are interested in extending the possibilities of using the ETCBC database, after having reached the limits of what is possible in [SHEBANQ](http://shebanq.ancient-data.org).

Here is a link (back) to the [description](http://shebanq-doc.readthedocs.org/en/latest/texts/context.html) of the transition from SHEBANQ to LAF-Fabric.

And here is the corresponding query on SHEBANQ: [Yesh](http://shebanq.ancient-data.org/hebrew/query?id=556).

# Introduction to MQL

Coming from LAF-Fabric and its notebooks, you might wonder what MQL is.
MQL stands for **Mini Query Language**, which is a query language optimized for textual resources.
[EMDROS](http://emdros.org) is a text database system written by Ulrik Sandborg-Petersen based on the PhD. thesis of Crist-Jan Doedens: [Text Databases. One Database Model and Several Retrieval Languages](http://books.google.nl/books?id=9ggOBRz1dO4C&dq=editions%3AISBN9051837291&source=gbs_book_other_versions).
The LAF resource which is ETCBC Hebrew Text Database is the result of converting an EMDROS database into LAF.

MQL is good for detecting syntactical patterns. LAF is good for programmatically walking through the text and gathering information as you go. *Of course, we want to have the best of both worlds!*

The query language of this system, MQL, is a so-called *topographic* query language, meaning that the query instruction is at the same time a template for the query results. More formally, there is a correspondence between the structure of the query instruction and the structure of the query results, and this correspondence holds for the sequential order and the embedding order.

Put otherwise, MQL is a very convenient language to query the data for tree fragments.

A specification of MQL can be found at the [Emdros docs page](http://emdros.org/docs.html).

In order to run this notebook, you need to have the [EMDROS software](http://sourceforge.net/projects/emdros/files/) installed. It is open source and there are binaries for Windows and Mac. The ETCBC database file is included in the laf-fabric-data working directory, that you can download from [DANS](http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71).

# LAF Fabric and MQL

This notebook shows how you can integrate MQL with LAF-Fabric. This is what you can do:

* write an MQL query in a code cell as a python string
* fire that query to the EMDROS database containing the ETCBC data
* getting the results back in the form of sets of nodes of LAF-Fabric.

# Sheafs, straws, and grains

An MQL query has a form like this:

 select all objects
 in {1-40}
 where
 [phrase
 [word g_cons = 'H']
 [word]
 ]
 ..
 [phrase
 [word]
 [word]
 ]

After the ``where`` there is a sequence of objects, which in turn may contain objects.
The query result after firing this query is a so-called *sheaf*. It is a list of results or *straws*, where each straw looks like the sequence of objects after the ``where``. These objects are the *matched objects* or *grains*.
And here is the catch: a grain may contain a sheaf itself, because the objects inside objects also may have multiple subresults in the data.

In other words: a sheaf is a recursive structure: it is a list of straws, which are lists of grains, which are monads (words) or objects containing a sheaf.

The sheaf is a very economic representation of the set of tree fragments that are the result of an MQL query.

Yet for most purposes it is necessary to have a list of individual results. We provide a method to generate results from a sheaf.
What this method does can be thought of as making copies of the sheaf, and wherever there is a sheaf (a list of straws), it replaces the sheaf by choosing a single straw. The results correspond to all possible ways of making those choices.

# MQL API

Inside the *etcbc* package there is a module *mql*.
This module exposes two classes: ``MQL`` and ``Sheaf``.

 from etcbc.mql import MQL

You initialize the MQL object after loading LAF-Fabric by passing the API as a parameter:

 Q = MQL(API)
 
If you have a query, e.g. the example above as a string in a variable ``query``, you can say:

 sheaf = Q.mql(query)

Then you have the results of the query in ``sheaf``. It is a list of lists of tuples (corresponding to *sheaf*, *straw*, *grain*),
where a grain is either an integer, which is the node corresponding to a monad (word object), or it is a tuple ``(node, subsheaf)``, where ``node`` corresponds to an object of an other type, containing a sheaf of subobjects.

``sheaf`` is an object of class ``Sheaf``, and we need this method of it:

* ``results()``: generates (as a generator) the list of results that is represented by the sheaf.

In [1]:
import sys
import collections
import subprocess

from lxml import etree

import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
from etcbc.mql import MQL
fabric = LafFabric()

 0.00s This is LAF-Fabric 4.4.6
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: http://shebanq-doc.readthedocs.org/en/latest/texts/welcome.html



In [2]:
API = fabric.load('etcbc4', '--', 'mql', {
 "xmlids": {"node": False, "edge": False},
 "features": ('''
 oid otype monads
 g_word g_word_utf8 g_cons lex 
 typ code function rela det
 book chapter verse label
 ''','''
 functional_parent
 '''),
 "prepare": prepare,
}, verbose='DETAIL')
exec(fabric.localnames.format(var='fabric'))
Q = MQL(API)

 0.00s LOADING API: please wait ... 
 0.00s DETAIL: COMPILING m: UP TO DATE
 0.10s INFO: USING DATA COMPILED AT: 2014-07-23T09-31-37
 0.10s DETAIL: COMPILING a: UP TO DATE
 0.11s DETAIL: load main: G.node_anchor_min
 0.81s DETAIL: load main: G.node_anchor_max
 0.87s DETAIL: load main: G.node_sort
 0.92s DETAIL: load main: G.node_sort_inv
 1.32s DETAIL: load main: G.edges_from
 1.39s DETAIL: load main: G.edges_to
 1.47s DETAIL: load main: F.etcbc4_db_monads [node] 
 2.23s DETAIL: load main: F.etcbc4_db_oid [node] 
 2.94s DETAIL: load main: F.etcbc4_db_otype [node] 
 3.67s DETAIL: load main: F.etcbc4_ft_code [node] 
 3.74s DETAIL: load main: F.etcbc4_ft_det [node] 
 4.04s DETAIL: load main: F.etcbc4_ft_function [node] 
 4.18s DETAIL: load main: F.etcbc4_ft_g_cons [node] 
 4.38s DETAIL: load main: F.etcbc4_ft_g_word [node] 
 4.63s DETAIL: load main: F.etcbc4_ft_g_word_utf8 [node] 
 4.97s DETAIL: load main: F.etcbc4_ft_lex [node] 
 5.18s DETAIL: load main: F.etcbc4_ft_rela [node] 
 5.59s D

In [8]:
fabric.lafapi.all_features_index['n']

defaultdict(... at 0x106739268>, {'det': [('etcbc4', 'ft')], 'lex_utf8': [('etcbc4', 'ft')], 'otype': [('etcbc4', 'db')], 'prs': [('etcbc4', 'ft')], 'typ': [('etcbc4', 'ft')], 'trailer_utf8': [('etcbc4', 'ft')], 'g_word': [('etcbc4', 'ft')], 'language': [('etcbc4', 'ft')], 'g_nme_utf8': [('etcbc4', 'ft')], 'g_prs': [('etcbc4', 'ft')], 'verse': [('etcbc4', 'sft')], 'g_cons': [('etcbc4', 'ft')], 'mother_object_type': [('etcbc4', 'ft')], 'g_word_utf8': [('etcbc4', 'ft')], 'maxmonad': [('etcbc4', 'db')], 'monads': [('etcbc4', 'db')], 'book': [('etcbc4', 'sft')], 'vbe': [('etcbc4', 'ft')], 'ls': [('etcbc4', 'ft')], 'g_pfm': [('etcbc4', 'ft')], 'nu': [('etcbc4', 'ft')], 'dist_unit': [('etcbc4', 'ft')], 'lex': [('etcbc4', 'ft')], 'vbs': [('etcbc4', 'ft')], 'g_nme': [('etcbc4', 'ft')], 'g_cons_utf8': [('etcbc4', 'ft')], 'pdp': [('etcbc4', 'ft')], 'g_vbe': [('etcbc4', 'ft')], 'nme': [('etcbc4', 'ft')], 'domain': [('etcbc4', 'ft')], 'oid': [('etcbc4', 'db')], 'st': [('etcbc4', 'ft')], 'function'

# The MQL query

Here is a query that selects two lexemes.
We want some context information of the result occurrences, so we pack the search item

 word lex="JC/" or lex=">JN/"

into a nest of objects around it, such as book, clause and phrase.

In [3]:
yesh_query = '''
select all objects where
[book [chapter [verse
[clause
 [clause_atom
 [phrase
 [phrase_atom
 [word lex="JC/" or lex=">JN/"]
 ]
 ]
 ]
]
]]]
'''

# Run the query

The following line of code invokes your local MQL processor (from your local EMDROS installation), and delivers the results as a compact sheaf.

In [4]:
sheaf = Q.mql(yesh_query)

# Processing the results

Now we process all results. Every result is a structure in the same shape as our query, filled with nodes that correspond to the book, chapter, verse, clause, phrase etc. of that result.

More precisely, the shape of each result looks like this:

 ((book_node, 
 ((chapter_node, 
 ((verse_node, 
 ((clause_node, 
 ((clause_atom_node, 
 ((phrase_node,
 ((phrase_atom_node, 
 ((word_node,),
 )),
 )),
 )),
 )),
 )),
 )),
 )),
 )

We iterate through all these objects, identify the nodes in it, and look up desired features of these nodes.
It is a bit daunting, this data structure, but it is easy to get it.
We just print out the first result of the sheaf and copy the result to our program code, and them replace the numbers by variables.


In [7]:
for x in sheaf.results():
 print(x)
 break

((1372281, ((1372321, ((1417966, ((426701, ((514687, ((605118, ((859791, ((771,),)),)),)),)),)),)),)),)


For each result, we write out a line of feature information.

In [26]:
nresults = 0
n_yesh = 0
n_ein = 0
for ((book, 
 ((chapter, 
 ((verse, 
 ((clause, 
 ((clause_atom, 
 ((phrase, 
 ((phrase_atom, 
 ((word,),
 )),
 )),
 )),
 )),
 )),
 )),
 )),
 ) in sheaf.results():
 nresults += 1
 lex = F.lex.v(word)
 if lex == 'JC/':
 n_yesh += 1
 else:
 n_ein += 1
 
 print('{:<15s} {:>3}:{:>3} {:<5s} {:<15s} {:>5}-{:<3} {}-{} {}'.format(
 F.book.v(book), 
 F.chapter.v(chapter), 
 F.verse.v(verse),
 lex,
 F.g_word.v(word),
 F.typ.v(clause),
 F.code.v(clause_atom),
 F.function.v(phrase),
 F.det.v(phrase_atom),
 F.g_word_utf8.v(word),
 ))
print('There are {} results, {} yesh and {} ein'.format(nresults, n_yesh, n_ein))

Genesis 2: 5 >JN/ >A80JIN NmCl-402 Nega-NA אַ֔יִן
Genesis 5: 24 >JN/ >;JNE85N.W. NmCl-407 NCoS-NA אֵינֶ֕נּוּ
Genesis 7: 8 >JN/ >;JNE73N.@H AjCl-10 NCoS-NA אֵינֶ֖נָּה
Genesis 11: 30 >JN/ >;71JN NmCl-107 NCop-NA אֵ֥ין
Genesis 18: 24 JC/ J;91C NmCl-101 Exst-und יֵ֛שׁ
Genesis 19: 31 >JN/ >;70JN NmCl-402 NCop-NA אֵ֤ין
Genesis 20: 7 >JN/ >;75JN:K@74 Ptcp-663 NCoS-NA אֵֽינְךָ֣
Genesis 20: 11 >JN/ >;JN& NmCl-999 NCop-NA אֵין
Genesis 23: 8 JC/ J;74C NmCl-999 Exst-und יֵ֣שׁ
Genesis 24: 23 JC/ J;94C NmCl-103 Exst-und יֵ֧שׁ
Genesis 24: 42 JC/ JEC:K@& Ptcp-660 ExsS-det יֶשְׁךָ
Genesis 24: 49 JC/ JEC:KE63M Ptcp-660 ExsS-det יֶשְׁכֶ֨ם
Genesis 28: 16 JC/ J;74C NmCl-999 Exst-und יֵ֣שׁ
Genesis 28: 17 >JN/ >;74JN NmCl-100 NCop-NA אֵ֣ין
Genesis 30: 1 >JN/ >A73JIN Ellp-603 Nega-NA אַ֖יִן
Genesis 30: 33 >JN/ >;JNEN.W.04 AjCl-10 NCoS-NA אֵינֶנּוּ֩
Genesis 31: 2 >JN/ >;JNEN.91W. NmCl-407 NCoS-NA אֵינֶנּ֛וּ
Genesis 31: 5 >JN/ >;JNE71N.W. NmCl-506 NCoS-NA אֵינֶ֥נּוּ
Genesis 31: 29 JC/ JEC& NmCl-102 Exst-und יֶש