"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"result 1"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.table(results, withNodes=True, end=5)\n",
"A.show(results, start=1, end=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**NB**\n",
"Gaps are a tricky phenomenon. In [gaps](searchGaps.ipynb) we will deal with them cruelly."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Performance tuning\n",
"\n",
"Here is an example by Yanniek van der Schans (2018-09-21)."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"query = \"\"\"\n",
"c:clause\n",
" PreGap:phrase_atom\n",
" LastPhrase:phrase_atom\n",
" :=\n",
"\n",
"Gap:clause_atom\n",
" :: word\n",
"\n",
"PreGap < Gap\n",
"Gap < LastPhrase\n",
"c || Gap\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are the current settings of the performance parameters:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Performance parameters, current values:\n",
"\ttryLimitFrom = 40\n",
"\ttryLimitTo = 40\n",
"\tyarnRatio = 1.25\n"
]
}
],
"source": [
"S.tweakPerformance()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.00s Checking search template ...\n",
" 0.00s Setting up search space for 5 objects ...\n",
" 0.13s Constraining search space with 8 relations ...\n",
" 0.29s \t2 edges thinned\n",
" 0.29s Setting up retrieval plan with strategy small_choice_multi ...\n",
" 0.30s Ready to deliver results from 454184 nodes\n",
"Iterate over S.fetch() to get the results\n",
"See S.showPlan() to interpret the results\n",
"Search with 5 objects and 8 relations\n",
"Results are instantiations of the following objects:\n",
"node 0-clause 88131 choices\n",
"node 1-phrase_atom 267532 choices\n",
"node 2-phrase_atom 88131 choices\n",
"node 3-clause_atom 5195 choices\n",
"node 4-word 5195 choices\n",
"Performance parameters:\n",
"\tyarnRatio = 1.25\n",
"\ttryLimitFrom = 40\n",
"\ttryLimitTo = 40\n",
"Instantiations are computed along the following relations:\n",
"node 3-clause_atom 5195 choices\n",
"edge 3-clause_atom [[ 4-word 1.0 choices\n",
"edge 3-clause_atom :: 4-word 0 choices\n",
"edge 3-clause_atom < 2-phrase_atom 44065.5 choices\n",
"edge 2-phrase_atom := 0-clause 1.0 choices (thinned)\n",
"edge 2-phrase_atom ]] 0-clause 0 choices\n",
"edge 0-clause || 3-clause_atom 0 choices\n",
"edge 0-clause [[ 1-phrase_atom 2.7 choices\n",
"edge 1-phrase_atom < 3-clause_atom 0 choices\n",
" 0.31s The results are connected to the original search template as follows:\n",
" 0 \n",
" 1 R0 c:clause\n",
" 2 R1 PreGap:phrase_atom\n",
" 3 R2 LastPhrase:phrase_atom\n",
" 4 :=\n",
" 5 \n",
" 6 R3 Gap:clause_atom\n",
" 7 R4 :: word\n",
" 8 \n",
" 9 PreGap < Gap\n",
"10 Gap < LastPhrase\n",
"11 c || Gap\n",
"12 \n"
]
}
],
"source": [
"S.study(query)\n",
"S.showPlan(details=True)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.00s Counting results per 1 up to 3 ...\n",
" | 0.00s 1\n",
" | 0.00s 2\n",
" | 1.62s 3\n",
" 3.32s Done: 4 results\n"
]
}
],
"source": [
"S.count(progress=1, limit=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Can we do better?\n",
"\n",
"The performance parameter `yarnRatio` can be used to increase the amount of pre-processing, and we can\n",
"increase to number of random samples that we make by `tryLimitFrom` and `tryLimitTo`.\n",
"\n",
"We start with increasing the amount of up-front edge-spinning."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Performance parameters, current values:\n",
"\ttryLimitFrom = 10000\n",
"\ttryLimitTo = 10000\n",
"\tyarnRatio = 0.2\n"
]
}
],
"source": [
"S.tweakPerformance(yarnRatio=0.2, tryLimitFrom=10000, tryLimitTo=10000)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.00s Checking search template ...\n",
" 0.00s Setting up search space for 5 objects ...\n",
" 0.12s Constraining search space with 8 relations ...\n",
" 0.41s \t2 edges thinned\n",
" 0.41s Setting up retrieval plan with strategy small_choice_multi ...\n",
" 0.50s Ready to deliver results from 454184 nodes\n",
"Iterate over S.fetch() to get the results\n",
"See S.showPlan() to interpret the results\n",
"Search with 5 objects and 8 relations\n",
"Results are instantiations of the following objects:\n",
"node 0-clause 88131 choices\n",
"node 1-phrase_atom 267532 choices\n",
"node 2-phrase_atom 88131 choices\n",
"node 3-clause_atom 5195 choices\n",
"node 4-word 5195 choices\n",
"Performance parameters:\n",
"\tyarnRatio = 0.2\n",
"\ttryLimitFrom = 10000\n",
"\ttryLimitTo = 10000\n",
"Instantiations are computed along the following relations:\n",
"node 3-clause_atom 5195 choices\n",
"edge 3-clause_atom [[ 4-word 1.0 choices\n",
"edge 3-clause_atom :: 4-word 0 choices\n",
"edge 3-clause_atom < 2-phrase_atom 44065.5 choices\n",
"edge 2-phrase_atom := 0-clause 1.0 choices (thinned)\n",
"edge 2-phrase_atom ]] 0-clause 0 choices\n",
"edge 0-clause || 3-clause_atom 0 choices\n",
"edge 0-clause [[ 1-phrase_atom 3.0 choices\n",
"edge 1-phrase_atom < 3-clause_atom 0 choices\n",
" 0.50s The results are connected to the original search template as follows:\n",
" 0 \n",
" 1 R0 c:clause\n",
" 2 R1 PreGap:phrase_atom\n",
" 3 R2 LastPhrase:phrase_atom\n",
" 4 :=\n",
" 5 \n",
" 6 R3 Gap:clause_atom\n",
" 7 R4 :: word\n",
" 8 \n",
" 9 PreGap < Gap\n",
"10 Gap < LastPhrase\n",
"11 c || Gap\n",
"12 \n"
]
}
],
"source": [
"S.study(query)\n",
"S.showPlan(details=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It seems to be the same plan."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.00s Counting results per 1 up to 3 ...\n",
" | 0.00s 1\n",
" | 0.00s 2\n",
" | 1.58s 3\n",
" 3.29s Done: 4 results\n"
]
}
],
"source": [
"S.count(progress=1, limit=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"No improvement."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What if we decrease the amount of edge spinning?"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Performance parameters, current values:\n",
"\ttryLimitFrom = 10000\n",
"\ttryLimitTo = 10000\n",
"\tyarnRatio = 5\n"
]
}
],
"source": [
"S.tweakPerformance(yarnRatio=5, tryLimitFrom=10000, tryLimitTo=10000)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.00s Checking search template ...\n",
" 0.00s Setting up search space for 5 objects ...\n",
" 0.14s Constraining search space with 8 relations ...\n",
" 0.32s \t2 edges thinned\n",
" 0.32s Setting up retrieval plan with strategy small_choice_multi ...\n",
" 0.41s Ready to deliver results from 454184 nodes\n",
"Iterate over S.fetch() to get the results\n",
"See S.showPlan() to interpret the results\n",
"Search with 5 objects and 8 relations\n",
"Results are instantiations of the following objects:\n",
"node 0-clause 88131 choices\n",
"node 1-phrase_atom 267532 choices\n",
"node 2-phrase_atom 88131 choices\n",
"node 3-clause_atom 5195 choices\n",
"node 4-word 5195 choices\n",
"Performance parameters:\n",
"\tyarnRatio = 5\n",
"\ttryLimitFrom = 10000\n",
"\ttryLimitTo = 10000\n",
"Instantiations are computed along the following relations:\n",
"node 3-clause_atom 5195 choices\n",
"edge 3-clause_atom [[ 4-word 1.0 choices\n",
"edge 3-clause_atom :: 4-word 0 choices\n",
"edge 3-clause_atom < 2-phrase_atom 44065.5 choices\n",
"edge 2-phrase_atom := 0-clause 1.0 choices (thinned)\n",
"edge 2-phrase_atom ]] 0-clause 0 choices\n",
"edge 0-clause || 3-clause_atom 0 choices\n",
"edge 0-clause [[ 1-phrase_atom 3.0 choices\n",
"edge 1-phrase_atom < 3-clause_atom 0 choices\n",
" 0.42s The results are connected to the original search template as follows:\n",
" 0 \n",
" 1 R0 c:clause\n",
" 2 R1 PreGap:phrase_atom\n",
" 3 R2 LastPhrase:phrase_atom\n",
" 4 :=\n",
" 5 \n",
" 6 R3 Gap:clause_atom\n",
" 7 R4 :: word\n",
" 8 \n",
" 9 PreGap < Gap\n",
"10 Gap < LastPhrase\n",
"11 c || Gap\n",
"12 \n"
]
}
],
"source": [
"S.study(query)\n",
"S.showPlan(details=True)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.00s Counting results per 1 up to 3 ...\n",
" | 0.00s 1\n",
" | 0.00s 2\n",
" | 1.61s 3\n",
" 3.33s Done: 4 results\n"
]
}
],
"source": [
"S.count(progress=1, limit=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, no improvement."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll look for queries where the parameters matter more in the future."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is how to reset the performance parameters:"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Performance parameters, current values:\n",
"\ttryLimitFrom = 40\n",
"\ttryLimitTo = 40\n",
"\tyarnRatio = 1.25\n"
]
}
],
"source": [
"S.tweakPerformance(yarnRatio=None, tryLimitFrom=None, tryLimitTo=None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Next\n",
"\n",
"You have seen cases where the implementation is to blame.\n",
"\n",
"Now I want to point to gaps in your understanding:\n",
"[gaps](searchGaps.ipynb)\n",
"\n",
"---\n",
"\n",
"[basic](search.ipynb)\n",
"[advanced](searchAdvanced.ipynb)\n",
"[sets](searchSets.ipynb)\n",
"[relations](searchRelations.ipynb)\n",
"[quantifiers](searchQuantifiers.ipynb)\n",
"rough\n",
"[gaps](searchGaps.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# All steps\n",
"\n",
"* **[start](start.ipynb)** your first step in mastering the bible computationally\n",
"* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures\n",
"* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n",
"\n",
"---\n",
"\n",
"[advanced](searchAdvanced.ipynb)\n",
"[sets](searchSets.ipynb)\n",
"[relations](searchRelations.ipynb)\n",
"[quantifiers](searchQuantifiers.ipynb)\n",
"[from MQL](searchFromMQL.ipynb)\n",
"rough\n",
"\n",
"You have seen cases where the implementation is to blame.\n",
"\n",
"Now I want to point to gaps in your understanding:\n",
"\n",
"[gaps](searchGaps.ipynb)\n",
"\n",
"---\n",
"\n",
"* **[export Excel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results\n",
"* **[share](share.ipynb)** draw in other people's data and let them use yours\n",
"* **[export](export.ipynb)** export your dataset as an Emdros database\n",
"* **[annotate](annotate.ipynb)** annotate plain text by means of other tools and import the annotations as TF features\n",
"* **[map](map.ipynb)** map somebody else's annotations to a new version of the corpus\n",
"* **[volumes](volumes.ipynb)** work with selected books only\n",
"* **[trees](trees.ipynb)** work with the BHSA data as syntax trees\n",
"\n",
"CC-BY Dirk Roorda"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}