\".join(\n",
" \"{} | \".format(\"\".join(str(_) for _ in row))\n",
" for row in table\n",
" )\n",
" )\n",
" )\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get counterparts\n",
"\n",
"Here is a function that gets the counterparts of phrases between versions, and classifies them according to dissimilarity.\n",
"\n",
"`phraseMapping` is keyed by a (source version, target version) pair,\n",
"then by dissimilarity, then by node in source version, and then\n",
"the value is a node in the target version.\n",
"\n",
"Source nodes that lack a counterpart, end up in a bucket with dissimilarity -1."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"phraseMapping = collections.OrderedDict()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def getPhrases(v, w):\n",
" V = api[v]\n",
" W = api[w]\n",
" mapVW = \"omap@{}-{}\".format(v, w)\n",
" vKey = (v, w)\n",
"\n",
" phraseMapping[vKey] = {}\n",
" phrases = phraseMapping[vKey]\n",
"\n",
" for n in V.F.otype.s(\"phrase\"):\n",
" ms = W.Es(mapVW).f(n)\n",
" if ms is not None:\n",
" phrases[n] = ms"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also want to see the evolution in one big leap, so we construct a mapping from the first version to the last,\n",
"just by composing the individual `omap@`s into a stride.\n",
"\n",
"Picking a phrase, and following it through the versions might lead to multiple counterparts.\n",
"When that happens, we choose the one with the highest similarity, and ignore the rest."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true,
"lines_to_end_of_cell_marker": 2
},
"outputs": [],
"source": [
"def composeMap(curMap, newStep):\n",
" resultMap = {}\n",
" for (n, ms) in curMap.items():\n",
" theM = (\n",
" ms[0][0] if len(ms) == 1 else sorted(ms, key=lambda x: (x[1], x[0]))[0][0]\n",
" )\n",
" resultMap[n] = newStep[theM]\n",
" return resultMap\n",
"\n",
"\n",
"def getFirstLastMapping():\n",
" if len(versions) <= 2:\n",
" return {}\n",
" curMap = phraseMapping[(versions[0], versions[1])]\n",
"\n",
" for i in range(2, len(versions)):\n",
" caption(0, \"mapping from {} to {}\".format(versions[0], versions[i]))\n",
" curMap = composeMap(curMap, phraseMapping[(versions[i - 1], versions[i])])\n",
" phraseMapping[(versions[0], versions[-1])] = curMap"
]
},
{
"cell_type": "markdown",
"metadata": {
"lines_to_next_cell": 2
},
"source": [
"# Table of boundary changes"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true,
"lines_to_next_cell": 2
},
"outputs": [],
"source": [
"def showStats(v, w):\n",
" vKey = (v, w)\n",
" phrases = phraseMapping[vKey]\n",
" dists = {}\n",
" for (n, ms) in phrases.items():\n",
" for (m, dis) in ms:\n",
" dists.setdefault(dis or 0, set()).add(m)\n",
" stats = collections.Counter()\n",
" for (dis, ms) in dists.items():\n",
" stats[dis] = len(ms)\n",
" table = []\n",
" table.append([\"dissimilarity\", \"number of phrases\"])\n",
" for dis in range(0, max(stats) + 1):\n",
" table.append([dis, stats.get(dis, \"\")])\n",
" tableText(table)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"lines_to_next_cell": 2
},
"source": [
"# Table of old and new values\n",
"We visualize the changes in the values of the `function` feature,\n",
"by generating a matrix, with old values in the row headers\n",
"and new values in the column headers, and the number of times that this old feature has changed into that new\n",
"feature in the corresponding matrix cells."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def featureDiff(v, w, feat):\n",
" V = api[v]\n",
" W = api[w]\n",
" vKey = (v, w)\n",
" vFeat = versionInfo[v][feat]\n",
" wFeat = versionInfo[w][feat]\n",
" phrases = phraseMapping[vKey]\n",
"\n",
" combis = {}\n",
" for (n, ms) in phrases.items():\n",
" vVal = V.Fs(vFeat).v(n)\n",
" for (m, dis) in ms:\n",
" wVal = W.Fs(wFeat).v(m)\n",
" combis.setdefault(vVal, collections.Counter())[wVal] += 1\n",
" vValues = sorted(combis.keys())\n",
" wValues = sorted(reduce(set.union, [set(combis[v]) for v in vValues], set()))\n",
" table = []\n",
" table.append([\"{}\\\\{}\".format(v, w)] + wValues)\n",
" for v in vValues:\n",
" table.append([v] + [str(combis[v].get(w, \"\")) for w in wValues])\n",
" tableText(table)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Collect\n",
"We collect all data in a big data structure."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"..............................................................................................\n",
". 55s Collecting data .\n",
"..............................................................................................\n",
"| 55s \t3 => 4 \n",
"| 57s \t4 => 4b \n",
"| 58s \t4b => 2016\n",
"| 1m 00s \t2016 => 2017\n",
"| 1m 02s \t3 => 2017\n",
"| 1m 02s mapping from 3 to 4b\n",
"| 1m 02s mapping from 3 to 2016\n",
"| 1m 02s mapping from 3 to 2017\n",
"| 1m 02s Done\n"
]
}
],
"source": [
"caption(4, \"Collecting data\")\n",
"for (i, w) in enumerate(versions):\n",
" if i == 0:\n",
" continue\n",
" v = versions[i - 1]\n",
" caption(0, \"\\t{:<4} => {:<4}\".format(v, w))\n",
" getPhrases(v, w)\n",
"\n",
"caption(0, \"\\t{:<4} => {:<4}\".format(versions[0], versions[-1]))\n",
"getFirstLastMapping()\n",
"caption(0, \"Done\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
|