<a href="http://laf-fabric.readthedocs.org/en/latest/" target="_blank"><img align="left" src="images/laf-fabric-small.png"/></a>
<a href="http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71" target="_blank"><img align="left"src="images/DANS-logo_small.png"/></a>
<a href="http://www.godgeleerdheid.vu.nl/etcbc" target="_blank"><img align="right" src="images/VU-ETCBC-small.png"/></a>
<a href="https://www.academic-bible.com/en/online-bibles/biblia-hebraica-stuttgartensia-bhs/read-the-bible-text/" target="_blank"><img align="right" src="files/images/DBG-small.png"/></a>

# Attributives

We want to make a list of all nouns with their adjectival modifiers.
We produce a tab separated file of phrases which contain a noun and adjectival modifiers.
The columns are

1. passage label
1. phrase text
1. phrase gloss
1. head of an attributive subphrase
1. attributive subphrase
1. number of words in the head
1. number of nouns in the head

Hebrew text is represented in ETCBC consonantal transcription, for ease of importing it in Excel.
It is not difficult to generate fully vocalized Hebrew, but then you need OpenOffice to open the csv file.

In [1]:
import sys, os
import collections

import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric()

  0.00s This is LAF-Fabric 4.8.3
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: https://shebanq.ancient-data.org/static/docs/featuredoc/texts/welcome.html



# Loading the feature data

In [2]:
version = '4b'
API = fabric.load('etcbc{}'.format(version), 'lexicon', 'adjectives', {
    "xmlids": {"node": False, "edge": False},
    "features": ('''
        otype 
        function rela sp
        gloss
        g_word_utf8 trailer_utf8
        book chapter verse number
    ''',
    '''
        mother
    '''),
    "prepare": prepare,
    "primary": False,
}, verbose='DETAIL')
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.01s DETAIL: COMPILING m: etcbc4b: UP TO DATE
  0.01s USING main: etcbc4b DATA COMPILED AT: 2015-11-02T15-08-56
  0.01s DETAIL: COMPILING a: lexicon: UP TO DATE
  0.01s USING annox: lexicon DATA COMPILED AT: 2016-07-08T14-32-54
  0.03s DETAIL: load main: G.node_anchor_min
  0.12s DETAIL: load main: G.node_anchor_max
  0.21s DETAIL: load main: G.node_sort
  0.27s DETAIL: load main: G.node_sort_inv
  0.65s DETAIL: load main: G.edges_from
  0.71s DETAIL: load main: G.edges_to
  0.77s DETAIL: load main: F.etcbc4_db_otype [node] 
  1.36s DETAIL: load main: F.etcbc4_ft_function [node] 
  1.47s DETAIL: load main: F.etcbc4_ft_g_word_utf8 [node] 
  1.74s DETAIL: load main: F.etcbc4_ft_number [node] 
  2.61s DETAIL: load main: F.etcbc4_ft_rela [node] 
  3.15s DETAIL: load main: F.etcbc4_ft_sp [node] 
  3.49s DETAIL: load main: F.etcbc4_ft_trailer_utf8 [node] 
  3.75s DETAIL: load main: F.etcbc4_sft_book [node] 
  3.78s DETAIL: load main: F.etcbc4_sft_chap

# Collect data

We need phrases that act as a mother to one or more attributive subphrases.
That means that such a subphrase must have the 
[rela](https://shebanq.ancient-data.org/shebanq/static/docs/featuredoc/features/comments/rela.html)
feature set to `atr`. 

Let us first collect subphrases having `rela = atr`.

In [3]:
attr_subphrases = set()
inf('Finding subphrases ...')
for s in F.otype.s('subphrase'):
    if F.rela.v(s) != 'atr':
        continue
    attr_subphrases.add(s)
inf('{} attributive subphrases'.format(len(attr_subphrases)))

  7.65s Finding subphrases ...
  8.88s 3106 attributive subphrases


Now let us add the mothers to those subphrases.
If there is no mother, we leave it out.
A subphrase should not have multiple mothers, but we'll check that anyway.

In [4]:
attr_subphrase_mother = dict()
multiple_mothers = set()
no_mothers = set()
for s in attr_subphrases:
    mothers = list(C.mother.v(s))
    if len(mothers) == 0:
        no_mothers.add(s)
        continue
    if len(mothers) > 1: 
        multiple_mothers.add(s)
        continue
    attr_subphrase_mother[s] = mothers[0]
if len(multiple_mothers):
    msg('{} subphrases with multiple mothers'.format(len(multiple_mothers)))
else:
    inf('No subphrases with multiple mothers')
if len(no_mothers):
    msg('{} subphrases without mothers'.format(len(no_mothers)))
else:
    inf('No subphrases without mothers')

inf('{} attributive subphrases with a single mother'.format(len(attr_subphrase_mother)))

    15s No subphrases with multiple mothers


    15s 12 subphrases without mothers


    15s 3094 attributive subphrases with a single mother


Let us get some information about the mothers of those subphrases.
What kind of objects are they?

In [5]:
mother_types = collections.Counter()
idents = 0
for (s, m) in attr_subphrase_mother.items():
    mother_types[F.otype.v(m)] +=1

for t in sorted(mother_types):
    print('{:>4} subphrases with a mother of type {}'.format(mother_types[t], t))

3094 subphrases with a mother of type subphrase


So the mother is always a subphrase.
What about the length of that subphrase?

In [6]:
mother_length = collections.Counter()
for (s, m) in attr_subphrase_mother.items():
    mother_length[len(L.d('word', m))] +=1

for t in sorted(mother_length):
    print('{:>4} subphrases with a mother of length {:>2}'.format(mother_length[t], t))

2085 subphrases with a mother of length  1
 919 subphrases with a mother of length  2
  62 subphrases with a mother of length  3
  14 subphrases with a mother of length  4
  11 subphrases with a mother of length  5
   1 subphrases with a mother of length  7
   1 subphrases with a mother of length  8
   1 subphrases with a mother of length  9


How many nouns has the mother?

In [7]:
mother_nouns = collections.Counter()
for (s, m) in attr_subphrase_mother.items():
    mother_nouns[len([w for w in L.d('word', m) if F.sp.v(w) == 'subs'])] +=1

for t in sorted(mother_nouns):
    print('{:>4} subphrases with a mother having {:>2} nouns'.format(mother_nouns[t], t))

  63 subphrases with a mother having  0 nouns
2867 subphrases with a mother having  1 nouns
 137 subphrases with a mother having  2 nouns
  12 subphrases with a mother having  3 nouns
   6 subphrases with a mother having  4 nouns
   8 subphrases with a mother having  5 nouns
   1 subphrases with a mother having  6 nouns


# Generating output

Let us now assemble all data into the final output.
We produce also a row of column headers.

In [8]:
fields = '''
    passage
    phrase_text
    phrase_gloss
    head
    attributive
    #words_mother
    #nouns_mother
'''.strip().split()
nfields = len(fields)
row_template = ('{}\t' * (nfields - 1))+'{}\n'

In [9]:
of_path_template = 'attributives_{}.csv'
for fmt in ['ec', 'ha']:
    of = open(of_path_template.format(fmt), 'w')
    of.write('{}\n'.format('\t'.join(fields)))
    for s in sorted(attr_subphrase_mother, key=NK):
        sw = list(L.d('word', s))
        p = L.u('phrase', s)
        pw = list(L.d('word', p))
        m = attr_subphrase_mother[s]
        mw = list(L.d('word', m))

        of.write(row_template.format(
            T.passage(s),
            T.words(pw, fmt=fmt).replace('\n', ' '),
            ' '.join(F.gloss.v(w) for w in pw),
            T.words(mw, fmt=fmt).replace('\n', ' '),
            T.words(sw, fmt=fmt).replace('\n', ' '),
            len(mw),
            len([w for w in mw if F.sp.v(w) == 'subs']),
        ))

    of.close()
    inf('Written {} lines to {}'.format(len(attr_subphrase_mother) + 1, of_path_template.format(fmt)))

    25s Written 3095 lines to attributives_ec.csv
    25s Written 3095 lines to attributives_ha.csv


# Results
[etcbc consonantal](attributives_ec.csv)
and
[fully pointed hebrew](attributives_ha.csv).

Screenshot made in the Numbers program:

<img align="left" src="attributives_numbers.png"/>

In [10]:
print(open(of_path_template.format('ec')).read()[0:1000])

passage	phrase_text	phrase_gloss	head	attributive	#words_mother	#nouns_mother
Genesis 1:8	JWM #NJ00 	day second	JWM 	#NJ00 	1	1
Genesis 1:13	JWM #LJ#J00 	day third	JWM 	#LJ#J00 	1	1
Genesis 1:16	>T&#NJ HM>RT HGDLJM 	<object marker> two the lamp the great	HM>RT 	HGDLJM 	2	1
Genesis 1:16	>T&HM>WR HGDL LMM#LT HJWM 	<object marker> the lamp the great to dominion the day	HM>WR 	HGDL 	2	1
Genesis 1:16	>T&HM>WR HQVN LMM#LT HLJLH 	<object marker> the lamp the small to dominion the night	HM>WR 	HQVN 	2	1
Genesis 1:19	JWM RBJ<J00 	day fourth	JWM 	RBJ<J00 	1	1
Genesis 1:20	#RY NP# XJH 	swarming creatures soul alive	NP# 	XJH 	1	1
Genesis 1:21	>T&HTNJNM HGDLJM W>T KL&NP# 	<object marker> the sea-monster the great and <object marker> whole soul	HTNJNM 	HGDLJM 	2	1
Genesis 1:23	JWM XMJ#J00 	day fifth	JWM 	XMJ#J00 	1	1
Genesis 1:24	NP# XJH 	soul alive	NP# 	XJH 	1	1
Genesis 1:30	NP# XJH 	soul alive	NP# 	XJH 	1	1
Genesis 2:2	BJWM H#BJ<J 	in the day the seventh	JWM 	H#BJ<J 	2	1
Genesis 2:2	BJWM H#BJ<J 	i