<a href="http://laf-fabric.readthedocs.org/en/latest/" target="_blank"><img align="left" src="images/laf-fabric-xsmall.png"/></a>
<a href="http://emdros.org" target="_blank"><img align="left" src="files/images/Emdros-xsmall.png"/></a>
<a href="http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71" target="_blank"><img align="left"src="images/etcbc4easy-small.png"/></a>
<a href="http://www.godgeleerdheid.vu.nl/etcbc" target="_blank"><img align="right" src="images/VU-ETCBC-xsmall.png"/></a>
<a href="http://tla.mpi.nl" target="_blank"><img align="right" src="images/TLA-xsmall.png"/></a>
<a href="http://www.dans.knaw.nl" target="_blank"><img align="right"src="images/DANS-xsmall.png"/></a>

# Monad Query Language

The LAF resource which is ETCBC Hebrew Text Database is the result of converting an EMDROS database into LAF.
[EMDROS](http://emdros.org) is a text database system written by Ulrik Sandborg-Petersen based on the PhD. thesis of Crist-Jan Doedens: [Text Databases. One Database Model and Several Retrieval Languages](http://books.google.nl/books?id=9ggOBRz1dO4C&dq=editions%3AISBN9051837291&source=gbs_book_other_versions).

The query language of this system, MQL, is a so-called *topographic* query language, meaning that the query instruction is at the same time a template for the query results. More formally, there is a correspondence between the structure of the query instruction and the structure of the query results, and this correspondence holds for the sequential order and the embedding order.

Put otherwise, MQL is a very convenient language to query the data for tree fragments.

A specification of MQL can be found at the [Emdros docs page](http://emdros.org/docs.html).

In order to run this notebook, you need to have the [EMDROS software](http://sourceforge.net/projects/emdros/files/) installed. It is open source and there are binaries for Windows and Mac. The ETCBC database file is included in the laf-fabric-data working directory, that you can download from [DANS](http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71).

# LAF Fabric and MQL

This notebook shows how you can integrate MQL with LAF-Fabric. This is what you can do:

* write an MQL query in a code cell as a python string
* fire that query to the EMDROS database containing the ETCBC data
* getting the results back in the form of sets of nodes of LAF-Fabric.

Because the LAF data has been migrated from the EMDROS data, we have a mapping from EMDROS object identifiers to LAF nodes.
We apply this mapping to the query results in order to translate them to nodesets.

# Sheafs, straws, and grains

An MQL query has a form like this:

    select all objects
    in {1-40}
    where
        [phrase
            [word g_cons = 'H']
            [word]
        ]
        ..
        [phrase
            [word]
            [word]
        ]

After the ``where`` there is a sequence of objects, which in turn may contain objects.
The query result after firing this query is a so-called *sheaf*. It is a list of results or *straws*, where each straw looks like the sequence of objects after the ``where``. These objects are the *matched objects* or *grains*.
And here is the catch: a grain may contain a sheaf itself, because the objects inside objects also may have multiple subresults in the data.

In other words: a sheaf is a recursive structure: it is a list of straws, which are lists of grains, which are monads (words) or objects containing a sheaf.

The sheaf is a very economic representation of the set of tree fragments that are the result of an MQL query.

Yet for some purposes it is necessary to have a list of ordinary results. We provide a method to generate results from a sheaf.
What this method does can be thought of as making copies of the sheaf, and wherever there is a sheaf (a list of straws), it replaces the sheaf by choosing a single straw. The results correspond to all possible ways of making those choices.

In other words: a ``result`` is a recursive structure: it is list of grains, which are monads (words) or objects containing a result.

Put otherwise: a ``result`` is a simplified sheaf, without the aggregating level of sheaf, leaving only straws and grains.

# MQL API

Inside the *etcbc* package there is a module *mql*.
This module exposes two classes: ``MQL`` and ``Sheaf``.

    from etcbc.mql import MQL

You initialize the MQL object after loading LAF-Fabric by passing the API as a parameter:

    Q = MQL(API)
   
If you have a query, e.g. the example above as a string in a variable ``query``, you can say:

    sheaf = Q.mql(query)

Then you have the results of the query in ``sheaf``. It is a list of lists of tuples (corresponding to *sheaf*, *straw*, *grain*),
where a grain is either an integer, which is the node corresponding to a monad (word object), or it is a tuple ``(node, subsheaf)``, where ``node`` corresponds to an object of an other type, containing a sheaf of subobjects.

``sheaf`` is an object of class ``Sheaf``, and there are the following methods in this class:

* ``render(callable)``: prints out the sheaf in a pretty format, each word is rendered by applying ``callable`` to its node.
* ``compact(callable)``: returns as string a compact representation of the sheaf, ``callable`` has the same meaning as above.
* ``results()``: generates (as a generator) the list of results that is represented by the sheaf.
* ``compact_results(callable)``: returns the compact representations of the results of the sheaf.

In [1]:
import sys
import collections
import subprocess

from lxml import etree

import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
from etcbc.mql import MQL
fabric = LafFabric()

  0.00s This is LAF-Fabric 4.4.6
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: http://shebanq-doc.readthedocs.org/en/latest/texts/welcome.html



In [2]:
API = fabric.load('etcbc4', '--', 'mql', {
    "xmlids": {"node": False, "edge": False},
    "features": ('''
        oid otype monads
        g_word_utf8 g_cons lex function
        book chapter verse label
    ''','''
        functional_parent
    '''),
    "prepare": prepare,
}, verbose='DETAIL')
exec(fabric.localnames.format(var='fabric'))
Q = MQL(API)

  0.00s LOADING API: please wait ... 
  0.00s DETAIL: COMPILING m: UP TO DATE
  0.00s INFO: USING DATA COMPILED AT: 2014-07-23T09-31-37
  0.01s DETAIL: COMPILING a: UP TO DATE
  0.01s DETAIL: load main: G.node_anchor_min
  0.08s DETAIL: load main: G.node_anchor_max
  0.14s DETAIL: load main: G.node_sort
  0.20s DETAIL: load main: G.node_sort_inv
  0.63s DETAIL: load main: G.edges_from
  0.70s DETAIL: load main: G.edges_to
  0.77s DETAIL: load main: F.etcbc4_db_monads [node] 
  1.48s DETAIL: load main: F.etcbc4_db_oid [node] 
  2.17s DETAIL: load main: F.etcbc4_db_otype [node] 
  2.81s DETAIL: load main: F.etcbc4_ft_function [node] 
  2.93s DETAIL: load main: F.etcbc4_ft_g_cons [node] 
  3.11s DETAIL: load main: F.etcbc4_ft_g_word_utf8 [node] 
  3.43s DETAIL: load main: F.etcbc4_ft_lex [node] 
  3.63s DETAIL: load main: F.etcbc4_sft_book [node] 
  3.65s DETAIL: load main: F.etcbc4_sft_chapter [node] 
  3.66s DETAIL: load main: F.etcbc4_sft_label [node] 
  3.68s DETAIL: load main: F.etcb

In [8]:
qu1 = '''
select all objects where
[subphrase
    [word lex="M<FH/"]
]
'''

qu2 = '''
select all objects
in {1-40}
where
    [phrase
        [word focus g_cons = 'H']
        [word]
    ]
    ..
    [phrase
        [word focus]
        [word]
    ]
'''
qu3 = '''
get objects having monads in {84134-8444}
[clause get mother]
'''
qu4 = '''
select all objects
where
[chapter book IN (Judices)
[word FOCUS 
sp = nmpr AND g_uvf = "~@H" OR
sp = subs AND g_uvf = "~@H" OR
sp = advb AND g_uvf = "~@H" 
]
]
'''
q5 = '''
select all objects
where

[word FOCUS 
sp = nmpr AND g_uvf = "~@H" OR
sp = subs AND g_uvf = "~@H" OR
sp = advb AND g_uvf = "~@H" 
]
'''
qudo = q5

In [12]:
sheaf = Q.mql(qudo, format='xml')
o = outfile('qudo.xml')
o.write(sheaf.decode(encoding='utf8'))
o.close()

In [9]:
sheaf = Q.mql(qudo)
sheaf.data

[[(2899,)],
 [(3439,)],
 [(4790,)],
 [(4794,)],
 [(4930,)],
 [(5460,)],
 [(5583,)],
 [(5587,)],
 [(5637,)],
 [(5671,)],
 [(5681,)],
 [(5699,)],
 [(5749,)],
 [(5881,)],
 [(6136,)],
 [(6138,)],
 [(6140,)],
 [(6142,)],
 [(6403,)],
 [(6407,)],
 [(6748,)],
 [(6754,)],
 [(7840,)],
 [(7894,)],
 [(8185,)],
 [(8428,)],
 [(8448,)],
 [(8545,)],
 [(8638,)],
 [(8808,)],
 [(8825,)],
 [(8860,)],
 [(8876,)],
 [(8882,)],
 [(8912,)],
 [(8921,)],
 [(8938,)],
 [(9250,)],
 [(9526,)],
 [(10971,)],
 [(11220,)],
 [(11283,)],
 [(11455,)],
 [(11665,)],
 [(11736,)],
 [(11995,)],
 [(12150,)],
 [(12404,)],
 [(12514,)],
 [(12580,)],
 [(12698,)],
 [(13000,)],
 [(13009,)],
 [(14435,)],
 [(14518,)],
 [(14520,)],
 [(14577,)],
 [(14602,)],
 [(14634,)],
 [(14675,)],
 [(14710,)],
 [(14715,)],
 [(14759,)],
 [(14761,)],
 [(14763,)],
 [(14765,)],
 [(14953,)],
 [(14991,)],
 [(16707,)],
 [(17487,)],
 [(18132,)],
 [(18346,)],
 [(18377,)],
 [(18381,)],
 [(19167,)],
 [(19355,)],
 [(19404,)],
 [(20374,)],
 [(20438,)],
 [(20479,)],

In [11]:
print(sheaf.compact(F.g_cons.v))

'M<LH'
'M<LH'
'GRRH'
'SDMH'
'SPRH'
'>RYH'
'>RYH'
'>RYH'
'HRH'
'NGBH'
'MYRJMH'
'MYRJMH'
'MYRJMH'
'NGBH'
'YPNH'
'NGBH'
'QDMH'
'JMH'
'CMH'
'HRH'
'XWYH'
'CMJMH'
'>RYH'
'>HLH'
'SDMH'
'SDMH'
'>RYH'
'PTXH'
'BJTH'
'XWYH'
'HRH'
'HRH'
'CMH'
'CMH'
'CMH'
'CMH'
'Y<RH'
'>RYH'
'CMH'
'CMH'
'CMH'
'CMH'
'<JNH'
'XWYH'
'BJTH'
'<JNH'
'>RYH'
'>HLH'
'QDMH'
'CMH'
'>CWRH'
'GRRH'
'MYRJMH'
'XRNH'
'PDNH'
'BJTH'
'PDNH'
'PDNH'
'PDNH'
'XRNH'
'>RYH'
'CMJMH'
'JMH'
'QDMH'
'YPNH'
'NGBH'
'>RYH'
'CMH'
'>RYH'
'>RYH'
'>RYH'
'F<JRH'
'F<JRH'
'SKTH'
'LWZH'
'>PRTH'
'>PRTH'
'>RYH'
'CKMH'
'DTJNH'
'BRH'
'MYRJMH'
'MYRJMH'
'C>LH'
'>RYH'
'TMNTH'
'TMNTH'
'TMNTH'
'MYRJMH'
'CMH'
'BJTH'
'XWYH'
'XWYH'
'XWYH'
'XWYH'
'MYRJMH'
'CMH'
'>RYH'
'>RYH'
'C>WLH'
'BJTH'
'BJTH'
'BJTH'
'BJTH'
'BJTH'
'>RYH'
'XDRH'
'CMH'
'>RYH'
'<JRH'
'BJTH'
'>RYH'
'C>LH'
'C>LH'
'MYRJMH'
'>RYH'
'B>RH'
'MYRJMH'
'MYRJMH'
'MYRJMH'
'MYRJMH'
'MYRJMH'
'MYRJMH'
'MYRJMH'
'GCNH'
'>RYH'
'GCNH'
'BJTH'
'MYRJMH'
'>PRTH'
'>RYH'
'CMH'
'CMH'
'CMH'
'CMH'
'>RYH'
'MYRJMH'
'MYRJMH'
'J>RH'
'

In [10]:
print(sheaf.compact_results(F.g_cons.v))

 'M<LH'
 'M<LH'
 'GRRH'
 'SDMH'
 'SPRH'
 '>RYH'
 '>RYH'
 '>RYH'
 'HRH'
 'NGBH'
 'MYRJMH'
 'MYRJMH'
 'MYRJMH'
 'NGBH'
 'YPNH'
 'NGBH'
 'QDMH'
 'JMH'
 'CMH'
 'HRH'
 'XWYH'
 'CMJMH'
 '>RYH'
 '>HLH'
 'SDMH'
 'SDMH'
 '>RYH'
 'PTXH'
 'BJTH'
 'XWYH'
 'HRH'
 'HRH'
 'CMH'
 'CMH'
 'CMH'
 'CMH'
 'Y<RH'
 '>RYH'
 'CMH'
 'CMH'
 'CMH'
 'CMH'
 '<JNH'
 'XWYH'
 'BJTH'
 '<JNH'
 '>RYH'
 '>HLH'
 'QDMH'
 'CMH'
 '>CWRH'
 'GRRH'
 'MYRJMH'
 'XRNH'
 'PDNH'
 'BJTH'
 'PDNH'
 'PDNH'
 'PDNH'
 'XRNH'
 '>RYH'
 'CMJMH'
 'JMH'
 'QDMH'
 'YPNH'
 'NGBH'
 '>RYH'
 'CMH'
 '>RYH'
 '>RYH'
 '>RYH'
 'F<JRH'
 'F<JRH'
 'SKTH'
 'LWZH'
 '>PRTH'
 '>PRTH'
 '>RYH'
 'CKMH'
 'DTJNH'
 'BRH'
 'MYRJMH'
 'MYRJMH'
 'C>LH'
 '>RYH'
 'TMNTH'
 'TMNTH'
 'TMNTH'
 'MYRJMH'
 'CMH'
 'BJTH'
 'XWYH'
 'XWYH'
 'XWYH'
 'XWYH'
 'MYRJMH'
 'CMH'
 '>RYH'
 '>RYH'
 'C>WLH'
 'BJTH'
 'BJTH'
 'BJTH'
 'BJTH'
 'BJTH'
 '>RYH'
 'XDRH'
 'CMH'
 '>RYH'
 '<JRH'
 'BJTH'
 '>RYH'
 'C>LH'
 'C>LH'
 'MYRJMH'
 '>RYH'
 'B>RH'
 'MYRJMH'
 'MYRJMH'
 'MYRJMH'
 'MYRJMH'
 'MYRJMH'
 'MYR

In [8]:
def testq(q):
    sheaf = Q.mql(q)
    print("{} results".format(sheaf.nresults()))

In [14]:
testq('''
select all objects
in {1-4000}
where
    [phrase
        [word focus g_cons = 'H']
        [word]
    ]
    ..
    [phrase
        [word focus]
        [word]
    ]
''')

299591 results
